| name | query-optimizer |
| description | Analyze and optimize SQL queries for better performance and efficiency. |
Query Optimizer Skill
Analyze and optimize SQL queries for better performance and efficiency.
Instructions
You are a database performance optimization expert. When invoked:
Analyze Query Performance:
- Use EXPLAIN/EXPLAIN ANALYZE to understand execution plan
- Identify slow queries from logs
- Measure query execution time
- Detect full table scans and missing indexes
Identify Bottlenecks:
- Find N+1 query problems
- Detect inefficient JOINs
- Identify missing or unused indexes
- Spot suboptimal WHERE clauses
Optimize Queries:
- Add appropriate indexes
- Rewrite queries for better performance
- Suggest caching strategies
- Recommend query restructuring
Provide Recommendations:
- Index creation suggestions
- Query rewriting alternatives
- Database configuration tuning
- Monitoring and alerting setup
Supported Databases
- SQL: PostgreSQL, MySQL, MariaDB, SQL Server, SQLite
- Analysis Tools: EXPLAIN, EXPLAIN ANALYZE, Query Profiler
- Monitoring: pg_stat_statements, slow query log, performance schema
Usage Examples
@query-optimizer
@query-optimizer --analyze-slow-queries
@query-optimizer --suggest-indexes
@query-optimizer --explain SELECT * FROM users WHERE email = 'test@example.com'
@query-optimizer --fix-n-plus-one
Query Analysis Tools
PostgreSQL - EXPLAIN ANALYZE
-- Basic EXPLAIN
EXPLAIN
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;
-- EXPLAIN ANALYZE - actually runs the query
EXPLAIN ANALYZE
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;
-- EXPLAIN with all options (PostgreSQL)
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, FORMAT JSON)
SELECT * FROM orders
WHERE user_id = 123
AND created_at >= '2024-01-01';
Reading EXPLAIN Output:
Seq Scan on users (cost=0.00..1234.56 rows=10000 width=32)
Filter: (active = true)
-- Seq Scan = Sequential Scan (full table scan) - BAD for large tables
-- cost=0.00..1234.56 = startup cost..total cost
-- rows=10000 = estimated rows
-- width=32 = average row size in bytes
Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=32)
Index Cond: (email = 'test@example.com'::text)
-- Index Scan = Using index - GOOD
-- Much lower cost than Seq Scan
-- rows=1 = accurate estimate
MySQL - EXPLAIN
-- MySQL EXPLAIN
EXPLAIN
SELECT u.id, u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;
-- EXPLAIN with execution stats (MySQL 8.0+)
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 123;
-- Show warnings for optimization info
EXPLAIN
SELECT * FROM users WHERE email = 'test@example.com';
SHOW WARNINGS;
MySQL EXPLAIN Output:
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 1000 | Using where |
+----+-------------+-------+------+---------------+---------+---------+-------+------+-------------+
-- type=ALL means full table scan - BAD
-- key=NULL means no index used
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
| 1 | SIMPLE | users | ref | idx_users_email| idx_users_email| 767 | const | 1 | NULL |
+----+-------------+-------+------+---------------+----------------+---------+-------+------+-------+
-- type=ref means index lookup - GOOD
-- key shows index being used
Common Performance Issues
1. Missing Indexes
Problem:
-- Slow query - full table scan
SELECT * FROM users WHERE email = 'john@example.com';
-- EXPLAIN shows:
-- Seq Scan on users (cost=0.00..1500.00 rows=1 width=100)
-- Filter: (email = 'john@example.com')
Solution:
-- Add index on email column
CREATE INDEX idx_users_email ON users(email);
-- Now EXPLAIN shows:
-- Index Scan using idx_users_email on users (cost=0.29..8.30 rows=1 width=100)
-- Index Cond: (email = 'john@example.com')
-- Query becomes 100x faster
2. N+1 Query Problem
Problem:
// ORM code causing N+1 queries
const users = await User.findAll(); // 1 query
for (const user of users) {
const orders = await Order.findAll({
where: { userId: user.id } // N queries (one per user)
});
console.log(`${user.name}: ${orders.length} orders`);
}
// Total: 1 + N queries for N users
// For 100 users = 101 queries!
Solution:
// Use eager loading - single query with JOIN
const users = await User.findAll({
include: [{
model: Order,
attributes: ['id', 'total_amount']
}]
});
for (const user of users) {
console.log(`${user.name}: ${user.orders.length} orders`);
}
// Total: 1 query regardless of user count
SQL Equivalent:
-- Instead of multiple queries:
SELECT * FROM users;
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... (N more queries)
-- Use single JOIN query:
SELECT
u.id,
u.name,
COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
3. SELECT * Inefficiency
Problem:
-- Fetching all columns when only need few
SELECT * FROM products
WHERE category_id = 5;
-- Fetches: id, name, description (large text), image_url, specs (json),
-- price, stock, created_at, updated_at, etc.
Solution:
-- Only select needed columns
SELECT id, name, price, stock
FROM products
WHERE category_id = 5;
-- Benefits:
-- - Less data transferred
-- - Faster query execution
-- - Lower memory usage
-- - Can use covering indexes
4. Inefficient Pagination
Problem:
-- OFFSET becomes slow with large offsets
SELECT * FROM users
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;
-- Database must:
-- 1. Sort all rows
-- 2. Skip 10,000 rows
-- 3. Return next 20
-- Gets slower as offset increases
Solution:
-- Use cursor-based (keyset) pagination
SELECT * FROM users
WHERE created_at < '2024-01-01 12:00:00'
AND (created_at < '2024-01-01 12:00:00' OR id < 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Or with indexed column:
SELECT * FROM users
WHERE id < 10000
ORDER BY id DESC
LIMIT 20;
-- Benefits:
-- - Consistent performance regardless of page
-- - Uses index efficiently
-- - No need to skip rows
5. Function on Indexed Column
Problem:
-- Function prevents index usage
SELECT * FROM users
WHERE LOWER(email) = 'john@example.com';
-- EXPLAIN shows Seq Scan (index not used)
Solution 1 - Store lowercase:
-- Add computed column
ALTER TABLE users ADD COLUMN email_lower VARCHAR(255)
GENERATED ALWAYS AS (LOWER(email)) STORED;
CREATE INDEX idx_users_email_lower ON users(email_lower);
-- Query:
SELECT * FROM users
WHERE email_lower = 'john@example.com';
Solution 2 - Functional index (PostgreSQL):
-- Create index on function result
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Now original query uses index
SELECT * FROM users
WHERE LOWER(email) = 'john@example.com';
Solution 3 - Case-insensitive collation:
-- PostgreSQL - use citext type
ALTER TABLE users ALTER COLUMN email TYPE citext;
-- Query without LOWER:
SELECT * FROM users WHERE email = 'john@example.com';
-- Automatically case-insensitive
6. Inefficient JOINs
Problem:
-- Multiple JOINs without proper indexes
SELECT
u.username,
o.id as order_id,
p.name as product_name
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE u.email = 'john@example.com';
-- Slow if missing indexes on:
-- - users.email
-- - orders.user_id
-- - order_items.order_id
-- - order_items.product_id
Solution:
-- Add necessary indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
-- Now query uses indexes for all JOINs
-- EXPLAIN will show "Index Scan" for each table
7. OR Conditions
Problem:
-- OR prevents efficient index usage
SELECT * FROM users
WHERE username = 'john' OR email = 'john@example.com';
-- May not use indexes optimally
Solution:
-- Use UNION for better index usage
SELECT * FROM users WHERE username = 'john'
UNION
SELECT * FROM users WHERE email = 'john@example.com';
-- Each subquery uses its own index
-- Deduplicates results automatically
8. NOT IN with Subquery
Problem:
-- Slow subquery execution
SELECT * FROM users
WHERE id NOT IN (
SELECT user_id FROM banned_users
);
-- Can be very slow with large subquery results
Solution:
-- Use LEFT JOIN with NULL check
SELECT u.*
FROM users u
LEFT JOIN banned_users bu ON u.id = bu.user_id
WHERE bu.user_id IS NULL;
-- Or use NOT EXISTS (often faster):
SELECT u.*
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM banned_users bu
WHERE bu.user_id = u.id
);
Index Optimization
When to Add Indexes
Add indexes for:
- Primary keys (automatic in most databases)
- Foreign keys (critical for JOINs)
- Columns in WHERE clauses
- Columns in ORDER BY clauses
- Columns in GROUP BY clauses
- Columns in JOIN conditions
- Columns with high cardinality (many unique values)
Index Types
B-Tree Index (Default):
-- Best for: equality (=) and range (<, >, BETWEEN) queries
CREATE INDEX idx_users_created_at ON users(created_at);
-- Good for:
SELECT * FROM users WHERE created_at > '2024-01-01';
SELECT * FROM users WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31';
Composite Index:
-- Index on multiple columns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Used for queries filtering both columns
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';
-- Also used for queries on first column only
SELECT * FROM orders WHERE user_id = 123;
-- NOT used for queries on second column only
SELECT * FROM orders WHERE status = 'pending'; -- Won't use this index
-- Column order matters! Most selective first
Partial Index (PostgreSQL):
-- Index only subset of rows
CREATE INDEX idx_active_users ON users(email)
WHERE active = true;
-- Smaller index, faster queries for active users
SELECT * FROM users WHERE email = 'john@example.com' AND active = true;
GIN Index (PostgreSQL - for arrays, JSONB, full-text):
-- For JSONB columns
CREATE INDEX idx_products_metadata ON products USING GIN(metadata);
-- Query JSONB data
SELECT * FROM products
WHERE metadata @> '{"brand": "Apple"}';
-- For array columns
CREATE INDEX idx_tags ON posts USING GIN(tags);
-- Query arrays
SELECT * FROM posts WHERE tags @> ARRAY['postgresql'];
Full-Text Search Index:
-- PostgreSQL
CREATE INDEX idx_products_search ON products
USING GIN(to_tsvector('english', name || ' ' || description));
-- Full-text search query
SELECT * FROM products
WHERE to_tsvector('english', name || ' ' || description)
@@ to_tsquery('english', 'laptop & gaming');
Covering Index
Concept:
-- Covering index includes all columns needed by query
CREATE INDEX idx_users_email_username ON users(email, username);
-- This query can be answered entirely from index (no table access)
SELECT username FROM users WHERE email = 'john@example.com';
-- PostgreSQL: Index-Only Scan
-- MySQL: Using index
With INCLUDE (PostgreSQL 11+):
-- Include non-indexed columns in index leaf nodes
CREATE INDEX idx_users_email ON users(email)
INCLUDE (username, created_at);
-- Query can use index without table access
SELECT username, created_at
FROM users
WHERE email = 'john@example.com';
Index Maintenance
Find Unused Indexes (PostgreSQL):
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Drop unused indexes to save space and improve write performance
Find Duplicate Indexes:
-- PostgreSQL query to find duplicate indexes
SELECT
indrelid::regclass AS table_name,
array_agg(indexrelid::regclass) AS indexes
FROM pg_index
GROUP BY indrelid, indkey
HAVING COUNT(*) > 1;
Rebuild Fragmented Indexes:
-- PostgreSQL
REINDEX INDEX idx_users_email;
REINDEX TABLE users;
-- MySQL
OPTIMIZE TABLE users;
Query Rewriting Examples
Example 1: Aggregation Optimization
Before:
SELECT
u.id,
u.username,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count,
(SELECT SUM(total_amount) FROM orders WHERE user_id = u.id) as total_spent
FROM users u
WHERE u.active = true;
-- N+1 problem: 1 query + 2 subqueries per user
After:
SELECT
u.id,
u.username,
COUNT(o.id) as order_count,
COALESCE(SUM(o.total_amount), 0) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.active = true
GROUP BY u.id, u.username;
-- Single query with JOIN
-- Much faster!
Example 2: EXISTS vs IN
Before:
SELECT * FROM products
WHERE id IN (
SELECT product_id FROM order_items
WHERE created_at >= '2024-01-01'
);
-- Subquery returns all product_ids (potentially large result set)
After:
SELECT p.* FROM products p
WHERE EXISTS (
SELECT 1 FROM order_items oi
WHERE oi.product_id = p.id
AND oi.created_at >= '2024-01-01'
);
-- EXISTS stops at first match (more efficient)
Example 3: Avoid Cartesian Products
Before:
-- Accidental cartesian product
SELECT *
FROM users u, orders o
WHERE u.active = true
AND o.status = 'completed';
-- Returns every user combined with every completed order!
-- Missing JOIN condition
After:
SELECT u.*, o.*
FROM users u
INNER JOIN orders o ON u.id = o.user_id
WHERE u.active = true
AND o.status = 'completed';
-- Proper JOIN condition
Example 4: Optimize DISTINCT
Before:
SELECT DISTINCT user_id
FROM orders
WHERE status = 'completed';
-- DISTINCT requires sorting/deduplication
After:
SELECT user_id
FROM orders
WHERE status = 'completed'
GROUP BY user_id;
-- GROUP BY often faster than DISTINCT
-- Or if unique constraint exists:
SELECT DISTINCT ON (user_id) user_id, created_at
FROM orders
WHERE status = 'completed'
ORDER BY user_id, created_at DESC;
Monitoring Slow Queries
PostgreSQL - pg_stat_statements
-- Enable extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slowest queries
SELECT
substring(query, 1, 50) AS short_query,
round(total_exec_time::numeric, 2) AS total_time,
calls,
round(mean_exec_time::numeric, 2) AS mean_time,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- Find queries with most calls
SELECT
substring(query, 1, 50) AS short_query,
calls,
round(mean_exec_time::numeric, 2) AS mean_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;
-- Reset statistics
SELECT pg_stat_statements_reset();
MySQL - Slow Query Log
-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1; -- Log queries taking > 1 second
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log';
-- Log queries not using indexes
SET GLOBAL log_queries_not_using_indexes = 'ON';
-- Analyze slow query log
-- Use mysqldumpslow tool:
-- mysqldumpslow -s t -t 10 /var/log/mysql/slow-query.log
Performance Schema (MySQL)
-- Enable performance schema
SET GLOBAL performance_schema = ON;
-- Find slowest statements
SELECT
DIGEST_TEXT,
COUNT_STAR AS executions,
ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_ms,
ROUND(SUM_TIMER_WAIT / 1000000000, 2) AS total_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
Best Practices
DO ✓
- Use EXPLAIN before and after optimization
- Add indexes on foreign keys - Critical for JOINs
- Use covering indexes when possible
- Paginate large result sets - Avoid loading all data
- Monitor query performance - Use pg_stat_statements or slow query log
- Test on production-like data - Performance differs with data volume
- Use connection pooling - Reduce connection overhead
- Cache frequently accessed data - Redis, Memcached
- Archive old data - Keep active tables smaller
- Regular VACUUM/ANALYZE (PostgreSQL) - Update statistics
DON'T ✗
- **Don't use SELECT *** - Fetch only needed columns
- Don't over-index - Each index slows down writes
- Don't ignore EXPLAIN warnings - They indicate problems
- Don't use functions on indexed columns - Prevents index usage
- Don't fetch more data than needed - Use LIMIT
- Don't use OFFSET for deep pagination - Use cursor-based instead
- Don't ignore database logs - Monitor for errors
- Don't optimize prematurely - Profile first, optimize bottlenecks
- Don't forget about write performance - Indexes slow down INSERTs
- Don't skip testing - Verify optimizations actually help
Query Optimization Checklist
## Query Optimization Checklist
- [ ] Run EXPLAIN/EXPLAIN ANALYZE on query
- [ ] Check if query uses indexes (no Seq Scan on large tables)
- [ ] Verify indexes exist on:
- [ ] Foreign key columns
- [ ] WHERE clause columns
- [ ] JOIN condition columns
- [ ] ORDER BY columns
- [ ] SELECT only needed columns (avoid SELECT *)
- [ ] Use appropriate JOIN type (INNER vs LEFT)
- [ ] Avoid N+1 queries (use JOINs or eager loading)
- [ ] Use pagination for large result sets
- [ ] Check for unused indexes (slow down writes)
- [ ] Consider query caching for frequent queries
- [ ] Test with production-like data volumes
- [ ] Monitor query performance over time
Notes
- Always measure before and after optimization
- Index creation can take time on large tables
- Too many indexes slow down INSERT/UPDATE/DELETE
- Keep database statistics up to date (ANALYZE)
- Consider read replicas for read-heavy workloads
- Use database-specific features when beneficial
- Document optimization decisions for team
- Regular performance audits prevent degradation