| name | query-optimize |
| description | Guidance for SQL query optimization tasks. This skill should be used when optimizing slow SQL queries, improving database performance, or rewriting queries to be more efficient. Covers query plan analysis, benchmarking strategies, and database-specific optimization techniques. |
SQL Query Optimization
This skill provides procedural guidance for optimizing SQL queries effectively, with emphasis on proper benchmarking, query plan analysis, and iterative refinement.
Core Optimization Workflow
1. Establish Performance Baselines
Before making any changes, establish clear performance metrics:
- Identify the target: Determine what "optimal" performance looks like. Look for reference implementations, documented benchmarks, or theoretical analysis of query complexity.
- Measure the original query: Record execution time with multiple runs to account for caching effects.
- Document the goal: Define success as "as fast as possible" not just "faster than the original."
2. Analyze Query Plans
Always use query plan analysis before and after optimization:
-- SQLite
EXPLAIN QUERY PLAN <query>;
-- PostgreSQL
EXPLAIN ANALYZE <query>;
-- MySQL
EXPLAIN <query>;
Key indicators to look for in query plans:
- Full table scans (SCAN TABLE) vs index usage (SEARCH TABLE USING INDEX)
- Correlated subqueries executing per-row
- Temporary B-tree creation for sorting/grouping
- Materialization of intermediate results
- Join order and join types
3. Identify Common Performance Issues
Correlated Subqueries: Subqueries that reference outer query columns execute once per row. Transform to JOINs or CTEs when possible.
Repeated Computations: The same subquery appearing multiple times should be factored out into a CTE or subquery.
Missing Indexes: Check for indexes on columns used in WHERE, JOIN, and ORDER BY clauses.
Inefficient Aggregations: GROUP BY on large result sets before filtering. Consider filtering earlier with WHERE.
4. Apply Optimization Techniques
Consider multiple approaches rather than stopping at the first working solution:
Approach 1: CTEs (Common Table Expressions)
- Pre-compute values used multiple times
- Improve readability
- Note: Some databases materialize CTEs which may hurt performance
Approach 2: Window Functions
- Efficient for ranking, running totals, and row numbering
- Avoid when simpler alternatives exist
Approach 3: Subquery Restructuring
- Convert correlated subqueries to JOINs
- Push predicates into subqueries to reduce intermediate result sizes
Approach 4: Join Optimization
- Reorder joins to filter early
- Use appropriate join types (INNER vs LEFT)
- Consider join hints if available
5. Database-Specific Considerations
SQLite:
- No parallel query execution
- CTEs may be materialized (can hurt performance)
- Limited optimizer compared to enterprise databases
- Consider using covering indexes
PostgreSQL:
- Supports parallel execution
- Advanced optimizer with cost-based decisions
- CTEs are optimization barriers in older versions (< 12)
MySQL:
- Derived table materialization can be forced or avoided
- Index hints available when optimizer makes poor choices
Verification Strategy
Correctness Verification
- Full result comparison: Compare all rows and columns against the original query output
- Edge case testing: Test with NULL values, empty results, ties in sorting/ranking
- Sample verification is insufficient: If the full result set is too large, use checksums or row counts with spot checks
Performance Verification
- Multiple runs: Execute at least 3-5 times and use median time
- Cold vs warm cache: Test both scenarios if relevant
- Compare against optimal: If a reference solution exists, compare against it, not just the original slow query
- Incremental profiling: Measure each CTE or subquery independently to identify bottlenecks
Common Pitfalls
Satisficing vs Optimizing
- Problem: Stopping optimization when query is "faster than before" rather than "as fast as possible"
- Prevention: Always establish what optimal performance looks like before starting. Compare against known-good implementations when available.
Skipping Query Plan Analysis
- Problem: Making changes without understanding why the query is slow
- Prevention: Always run EXPLAIN before and after changes. Understand the query plan before proposing solutions.
Premature CTE Usage
- Problem: Assuming CTEs always improve performance. Some databases materialize CTEs, adding overhead.
- Prevention: Test both CTE and non-CTE versions. Profile each CTE independently.
Over-reliance on Window Functions
- Problem: Using ROW_NUMBER() or similar when simpler approaches work
- Prevention: Consider if a GROUP BY with MIN/MAX or a simple correlated subquery might be more efficient.
Incomplete Testing
- Problem: Verifying only a sample of rows due to timeouts
- Prevention: Use checksums, row counts, or hash comparisons for full validation. Test with smaller data subsets first.
Single-Approach Optimization
- Problem: Implementing one optimization approach without exploring alternatives
- Prevention: Always test at least 2-3 different approaches before selecting the best one.
Iterative Refinement Process
- Analyze the original query and its plan
- Identify the primary bottleneck
- Propose 2-3 alternative approaches
- Implement and benchmark each approach
- Select the best performing approach
- Verify correctness with full result comparison
- Document the optimization rationale
Checklist Before Declaring Success
- Query plan analyzed and understood
- Multiple optimization approaches considered
- Performance compared against optimal/reference (not just original)
- Full result correctness verified (not just samples)
- Edge cases tested (NULLs, ties, empty results)
- Performance measured across multiple runs
- Database-specific optimizations considered