| name | sql-expert |
| description | Expert SQL query writing, optimization, and database schema design with support for PostgreSQL, MySQL, SQLite, and SQL Server. Use when working with databases for: (1) Writing complex SQL queries with joins, subqueries, and window functions, (2) Optimizing slow queries and analyzing execution plans, (3) Designing database schemas with proper normalization, (4) Creating indexes and improving query performance, (5) Writing migrations and handling schema changes, (6) Debugging SQL errors and query issues |
SQL Expert Skill
Expert guidance for writing, optimizing, and managing SQL databases across PostgreSQL, MySQL, SQLite, and SQL Server.
Core Capabilities
This skill enables you to:
- Write complex SQL queries with JOINs, subqueries, CTEs, and window functions
- Optimize slow queries using EXPLAIN plans and index recommendations
- Design database schemas with proper normalization (1NF, 2NF, 3NF, BCNF)
- Create effective indexes for query performance
- Write database migrations safely with rollback support
- Debug SQL errors and understand error messages
- Handle transactions with proper isolation levels
- Work with JSON/JSONB data types
- Generate sample data for testing
- Convert between database dialects (PostgreSQL ↔ MySQL ↔ SQLite)
Supported Database Systems
PostgreSQL
Best for: Complex queries, JSON data, advanced features, ACID compliance
pip install psycopg2-binary sqlalchemy
MySQL/MariaDB
Best for: Web applications, WordPress, high-read workloads
pip install mysql-connector-python sqlalchemy
SQLite
Best for: Local development, embedded databases, testing
pip install sqlite3 # Built into Python
SQL Server
Best for: Enterprise applications, Windows environments
pip install pyodbc sqlalchemy
Query Writing
Basic SELECT with JOINs
-- Simple SELECT with filtering
SELECT
column1,
column2,
column3
FROM
table_name
WHERE
condition = 'value'
AND another_condition > 100
ORDER BY
column1 DESC
LIMIT 10;
-- INNER JOIN
SELECT
users.name,
orders.order_date,
orders.total_amount
FROM
users
INNER JOIN
orders ON users.id = orders.user_id
WHERE
orders.status = 'completed';
-- LEFT JOIN (include all users, even without orders)
SELECT
users.name,
COUNT(orders.id) as order_count,
COALESCE(SUM(orders.total_amount), 0) as total_spent
FROM
users
LEFT JOIN
orders ON users.id = orders.user_id
GROUP BY
users.id, users.name;
Subqueries and CTEs
-- Subquery in WHERE clause
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Common Table Expression (CTE)
WITH high_value_customers AS (
SELECT
user_id,
SUM(total_amount) as lifetime_value
FROM orders
GROUP BY user_id
HAVING SUM(total_amount) > 1000
)
SELECT
users.name,
users.email,
hvc.lifetime_value
FROM users
INNER JOIN high_value_customers hvc ON users.id = hvc.user_id;
Window Functions
-- Ranking within groups
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM
employees;
-- Running totals
SELECT
order_date,
total_amount,
SUM(total_amount) OVER (ORDER BY order_date) as running_total
FROM
orders;
-- Moving averages
SELECT
order_date,
total_amount,
AVG(total_amount) OVER (
ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7days
FROM
daily_sales;
See examples/complex_queries.sql for more advanced query patterns.
Query Optimization
Using EXPLAIN
-- Analyze query performance
EXPLAIN ANALYZE
SELECT
users.name,
COUNT(orders.id) as order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
GROUP BY users.id, users.name;
-- Look for:
-- - Seq Scan (bad) vs Index Scan (good)
-- - High cost numbers
-- - Large row counts being processed
Quick Optimization Tips
-- BAD: Function on indexed column
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- GOOD: Keep indexed column clean
SELECT * FROM users WHERE email = LOWER('user@example.com');
-- BAD: SELECT *
SELECT * FROM large_table WHERE id = 123;
-- GOOD: Select only needed columns
SELECT id, name, email FROM large_table WHERE id = 123;
For comprehensive optimization techniques, see references/query-optimization.md.
Schema Design
Normalization Principles
First Normal Form (1NF): Eliminate repeating groups, use atomic values
-- GOOD: Separate table for order items
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_name VARCHAR(100)
);
CREATE TABLE order_items (
order_item_id INT PRIMARY KEY,
order_id INT REFERENCES orders(order_id),
product_name VARCHAR(100)
);
Second Normal Form (2NF): All non-key attributes depend on entire primary key
-- GOOD: Separate product information
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
product_price DECIMAL(10, 2)
);
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Third Normal Form (3NF): No transitive dependencies
Common Schema Patterns
One-to-Many:
CREATE TABLE authors (
author_id INT PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100) UNIQUE
);
CREATE TABLE books (
book_id INT PRIMARY KEY,
title VARCHAR(200),
author_id INT NOT NULL,
published_date DATE,
FOREIGN KEY (author_id) REFERENCES authors(author_id)
);
Many-to-Many:
CREATE TABLE students (
student_id INT PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(100)
);
-- Junction table
CREATE TABLE enrollments (
enrollment_id INT PRIMARY KEY,
student_id INT NOT NULL,
course_id INT NOT NULL,
enrollment_date DATE,
grade CHAR(2),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id),
UNIQUE (student_id, course_id)
);
See examples/schema_examples.sql for more schema patterns.
Indexes and Performance
Creating Indexes
-- Single column index
CREATE INDEX idx_users_email ON users(email);
-- Composite index (order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, order_date);
-- Unique index
CREATE UNIQUE INDEX idx_users_username ON users(username);
-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
Index Guidelines
When to create indexes:
- ✅ Columns used in WHERE clauses
- ✅ Columns used in JOIN conditions
- ✅ Columns used in ORDER BY
- ✅ Foreign key columns
When NOT to create indexes:
- ❌ Small tables (< 1000 rows)
- ❌ Columns with low selectivity (boolean fields)
- ❌ Columns frequently updated
For detailed index strategies, see references/indexes-performance.md.
Migrations
Safe Migration Pattern
-- Step 1: Add column as nullable
ALTER TABLE users ADD COLUMN status VARCHAR(20);
-- Step 2: Populate existing rows
UPDATE users SET status = 'active' WHERE status IS NULL;
-- Step 3: Make it NOT NULL
ALTER TABLE users ALTER COLUMN status SET NOT NULL;
-- Step 4: Add default for new rows
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';
-- Rollback plan
ALTER TABLE users DROP COLUMN status;
Zero-Downtime Migrations
-- GOOD: Add column as nullable first, then backfill
ALTER TABLE large_table ADD COLUMN new_column VARCHAR(100);
-- Backfill in batches
UPDATE large_table SET new_column = 'value' WHERE new_column IS NULL LIMIT 1000;
-- Repeat until complete
-- Then make it NOT NULL
ALTER TABLE large_table ALTER COLUMN new_column SET NOT NULL;
See examples/migrations.sql for more migration patterns.
Advanced Patterns
UPSERT (Insert or Update)
-- PostgreSQL
INSERT INTO users (user_id, name, email, updated_at)
VALUES (1, 'John Doe', 'john@example.com', NOW())
ON CONFLICT (user_id)
DO UPDATE SET
name = EXCLUDED.name,
email = EXCLUDED.email,
updated_at = NOW();
-- MySQL
INSERT INTO users (user_id, name, email, updated_at)
VALUES (1, 'John Doe', 'john@example.com', NOW())
ON DUPLICATE KEY UPDATE
name = VALUES(name),
email = VALUES(email),
updated_at = NOW();
Recursive CTEs
-- Hierarchical data traversal
WITH RECURSIVE employee_hierarchy AS (
-- Anchor: top-level employees
SELECT id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: employees reporting to previous level
SELECT e.id, e.name, e.manager_id, eh.level + 1
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy ORDER BY level, name;
For more advanced patterns including pivot tables, JSON operations, and bulk operations, see references/advanced-patterns.md.
Best Practices
Critical Guidelines
- Always use parameterized queries to prevent SQL injection
- Use transactions for related operations to ensure atomicity
- Add appropriate constraints (PRIMARY KEY, FOREIGN KEY, NOT NULL, CHECK)
- Include timestamps (created_at, updated_at) on tables
- Use meaningful names for tables and columns
- **Avoid SELECT *** - specify only needed columns
- Index foreign keys for join performance
- Use VARCHAR instead of CHAR for variable-length strings
- Handle NULL values properly with IS NULL / IS NOT NULL
- Use appropriate data types (DECIMAL for money, not FLOAT)
Example with multiple best practices:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
user_id INT NOT NULL,
order_date DATE NOT NULL DEFAULT CURRENT_DATE,
total_amount DECIMAL(10, 2) CHECK (total_amount >= 0),
status VARCHAR(20) CHECK (status IN ('pending', 'completed', 'cancelled')),
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(user_id)
);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
For comprehensive best practices, see references/best-practices.md.
Common Pitfalls
Watch out for these frequent issues:
- N+1 Query Problem - Use JOINs instead of loops with queries
- Not using LIMIT for exploratory queries on large tables
- Implicit type conversions preventing index usage
- Using COUNT(*) when EXISTS is sufficient
- Not handling NULLs properly (NULL = NULL is always NULL, not TRUE)
- Using SELECT DISTINCT as a band-aid instead of fixing the query
- Forgetting transactions for related operations
- Using functions on indexed columns preventing index usage
Example - Avoiding N+1:
# BAD: N+1 queries
users = db.query("SELECT * FROM users")
for user in users:
orders = db.query("SELECT * FROM orders WHERE user_id = ?", user.id)
# GOOD: Single query with JOIN
result = db.query("""
SELECT users.*, orders.*
FROM users
LEFT JOIN orders ON users.id = orders.user_id
""")
For a complete list of pitfalls and solutions, see references/common-pitfalls.md.
Helper Scripts and Examples
Available Resources
Helper Scripts (scripts/):
sql_helper.py- Utility functions for query building, schema introspection, index analysis, and migration helpers
Examples (examples/):
complex_queries.sql- Advanced query patterns with CTEs, window functions, and subqueriesschema_examples.sql- Complete schema design examples for various use casesmigrations.sql- Safe migration patterns and zero-downtime techniques
References (references/):
query-optimization.md- Comprehensive query optimization techniques and EXPLAIN analysisindexes-performance.md- Detailed index strategies, maintenance, and monitoringadvanced-patterns.md- UPSERT, bulk operations, pivot tables, JSON operations, recursive queriesbest-practices.md- Complete SQL best practices guidecommon-pitfalls.md- Common mistakes and how to avoid them
Quick Start
- For basic queries, use the patterns shown above
- For optimization, start with EXPLAIN and check
references/query-optimization.md - For schema design, review normalization patterns and see
examples/schema_examples.sql - For complex scenarios, check
references/advanced-patterns.md - For utilities, use
scripts/sql_helper.py
Workflow
When working with SQL databases:
- Understand requirements - What data needs to be queried or stored?
- Design schema - Apply normalization, choose appropriate data types
- Create indexes - Index foreign keys and frequently queried columns
- Write queries - Start simple, add complexity as needed
- Optimize - Use EXPLAIN to identify bottlenecks
- Test - Verify with sample data and edge cases
- Document - Add comments for complex queries
For migrations:
- Plan changes - Identify affected tables and dependencies
- Write migration - Create both up and down migrations
- Test on copy - Test on development database first
- Backup - Always backup before running migrations
- Execute - Run migrations during low-traffic periods
- Verify - Check data integrity after migration