| name | lang-sql-dev |
| description | Foundational SQL patterns for query writing, schema design, and dialect differences. Use when writing SQL queries, designing database schemas, understanding SQL syntax across PostgreSQL/MySQL/SQLite, or preparing SQL for conversion to other query languages. This is a meta-skill for SQL derivatives. |
SQL Fundamentals
Foundational SQL patterns covering query writing, schema design, and dialect differences across PostgreSQL, MySQL, and SQLite. This skill serves as a base for specialized SQL skills and SQL-to-X conversions.
Overview
┌─────────────────────────────────────────────────────────────────┐
│ SQL Skill Hierarchy │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ lang-sql │ ◄── You are here │
│ │ (foundation) │ │
│ └──────┬───────┘ │
│ │ │
│ ┌───────────────────┼───────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ sql-to- │ │ sql- │ │ data- │ │
│ │ polars │ │optimization │ │ postgres │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
This skill covers:
- Query writing (SELECT, JOIN, CTEs, window functions)
- Schema design (tables, constraints, normalization)
- Dialect differences (PostgreSQL, MySQL, SQLite)
- Performance basics (EXPLAIN, indexing fundamentals)
- SQL syntax patterns useful for conversion
This skill does NOT cover:
- Deep optimization strategies - see
sql-optimization-patterns - ORM usage (SQLAlchemy, Prisma, etc.)
- Database administration (backups, replication, users)
- Platform-specific features (stored procedures, triggers)
Quick Reference
| Task | PostgreSQL | MySQL | SQLite |
|---|---|---|---|
| Show tables | \dt |
SHOW TABLES |
.tables |
| Describe table | \d table |
DESCRIBE table |
.schema table |
| Current database | SELECT current_database() |
SELECT DATABASE() |
N/A (file-based) |
| List indexes | \di |
SHOW INDEX FROM table |
.indexes table |
| Explain query | EXPLAIN ANALYZE |
EXPLAIN |
EXPLAIN QUERY PLAN |
Query Patterns
SELECT Fundamentals
-- Basic SELECT with filtering
SELECT column1, column2
FROM table_name
WHERE condition
ORDER BY column1 DESC
LIMIT 10;
-- Column aliases
SELECT
first_name AS "First Name",
last_name AS "Last Name",
salary * 12 AS annual_salary
FROM employees;
JOIN Types
-- INNER JOIN (matching rows only)
SELECT o.id, c.name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;
-- LEFT JOIN (all from left, matching from right)
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.name;
-- Multiple JOINs
SELECT
o.id,
c.name as customer,
p.name as product
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id;
Common Table Expressions (CTEs)
-- Basic CTE
WITH active_users AS (
SELECT id, name, email
FROM users
WHERE status = 'active'
)
SELECT * FROM active_users WHERE email LIKE '%@company.com';
-- Multiple CTEs
WITH
recent_orders AS (
SELECT * FROM orders WHERE created_at > NOW() - INTERVAL '30 days'
),
order_totals AS (
SELECT customer_id, SUM(amount) as total
FROM recent_orders
GROUP BY customer_id
)
SELECT c.name, ot.total
FROM order_totals ot
JOIN customers c ON ot.customer_id = c.id;
-- Recursive CTE (hierarchical data)
WITH RECURSIVE org_chart AS (
-- Base case: top-level managers
SELECT id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: employees under managers
SELECT e.id, e.name, e.manager_id, oc.level + 1
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT * FROM org_chart ORDER BY level, name;
Window Functions
-- ROW_NUMBER: Sequential numbering within partition
SELECT
department,
name,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM employees;
-- Running totals
SELECT
date,
amount,
SUM(amount) OVER (ORDER BY date) as running_total
FROM transactions;
-- Compare to previous row
SELECT
date,
revenue,
LAG(revenue) OVER (ORDER BY date) as prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY date) as change
FROM daily_sales;
-- Percentile ranking
SELECT
name,
score,
PERCENT_RANK() OVER (ORDER BY score) as percentile
FROM test_results;
Subqueries
-- Scalar subquery (returns single value)
SELECT name, salary,
(SELECT AVG(salary) FROM employees) as avg_salary
FROM employees;
-- IN subquery
SELECT * FROM products
WHERE category_id IN (
SELECT id FROM categories WHERE name LIKE '%Electronics%'
);
-- EXISTS subquery (often more efficient)
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id
AND o.created_at > NOW() - INTERVAL '30 days'
);
-- Correlated subquery (references outer query)
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
Aggregation Patterns
GROUP BY
-- Basic aggregation
SELECT
department,
COUNT(*) as employee_count,
AVG(salary) as avg_salary,
MIN(salary) as min_salary,
MAX(salary) as max_salary
FROM employees
GROUP BY department;
-- HAVING (filter groups)
SELECT department, COUNT(*) as count
FROM employees
GROUP BY department
HAVING COUNT(*) > 5;
-- Multiple grouping columns
SELECT
department,
job_title,
COUNT(*) as count
FROM employees
GROUP BY department, job_title
ORDER BY department, count DESC;
CASE Expressions
-- Conditional aggregation
SELECT
COUNT(CASE WHEN status = 'active' THEN 1 END) as active_count,
COUNT(CASE WHEN status = 'inactive' THEN 1 END) as inactive_count,
COUNT(*) as total
FROM users;
-- Bucketing data
SELECT
CASE
WHEN age < 18 THEN 'minor'
WHEN age < 65 THEN 'adult'
ELSE 'senior'
END as age_group,
COUNT(*) as count
FROM users
GROUP BY 1; -- Group by first select column
Schema Design
Table Creation
-- Basic table with constraints
CREATE TABLE users (
id SERIAL PRIMARY KEY, -- PostgreSQL
-- id INT AUTO_INCREMENT PRIMARY KEY, -- MySQL
-- id INTEGER PRIMARY KEY AUTOINCREMENT, -- SQLite
email VARCHAR(255) NOT NULL UNIQUE,
name VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Table with foreign key
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
status VARCHAR(20) DEFAULT 'pending',
total DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Many-to-many junction table
CREATE TABLE user_roles (
user_id INT REFERENCES users(id) ON DELETE CASCADE,
role_id INT REFERENCES roles(id) ON DELETE CASCADE,
granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (user_id, role_id)
);
Normalization Quick Guide
| Form | Rule | Example |
|---|---|---|
| 1NF | No repeating groups | Split phone1, phone2 into separate rows |
| 2NF | No partial dependencies | Move dept_name to departments table if key is (emp_id, dept_id) |
| 3NF | No transitive dependencies | Move city, state to addresses if they depend on zip_code |
Common Constraints
-- CHECK constraint
ALTER TABLE products
ADD CONSTRAINT positive_price CHECK (price > 0);
-- UNIQUE constraint on multiple columns
ALTER TABLE subscriptions
ADD CONSTRAINT unique_user_plan UNIQUE (user_id, plan_id);
-- NOT NULL with default
ALTER TABLE users
ALTER COLUMN status SET DEFAULT 'active',
ALTER COLUMN status SET NOT NULL;
Index Basics
When to Index
| Index | Use Case |
|---|---|
| Primary key | Automatic, unique identifier |
| Foreign key | Speed up JOINs |
| Frequently filtered columns | WHERE clauses |
| Frequently sorted columns | ORDER BY clauses |
| Composite | Multi-column WHERE/ORDER |
Index Creation
-- Single column index
CREATE INDEX idx_users_email ON users(email);
-- Composite index (order matters!)
-- Good for: WHERE status = 'active' AND created_at > '2024-01-01'
CREATE INDEX idx_orders_status_created ON orders(status, created_at);
-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
-- Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
Reading EXPLAIN
-- PostgreSQL: Full analysis with timing
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
-- Key things to look for:
-- ✓ Index Scan / Index Only Scan - good
-- ✗ Seq Scan on large tables - investigate
-- ✗ High "actual rows" vs "estimated rows" - stale statistics
Dialect Differences
String Concatenation
-- PostgreSQL
SELECT first_name || ' ' || last_name AS full_name FROM users;
-- MySQL
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM users;
-- SQLite (both work)
SELECT first_name || ' ' || last_name AS full_name FROM users;
Date/Time Operations
-- Current timestamp
-- PostgreSQL: NOW(), CURRENT_TIMESTAMP
-- MySQL: NOW(), CURRENT_TIMESTAMP
-- SQLite: datetime('now')
-- Date arithmetic
-- PostgreSQL
SELECT created_at + INTERVAL '7 days' FROM orders;
-- MySQL
SELECT DATE_ADD(created_at, INTERVAL 7 DAY) FROM orders;
-- SQLite
SELECT datetime(created_at, '+7 days') FROM orders;
-- Extract parts
-- PostgreSQL
SELECT EXTRACT(YEAR FROM created_at) FROM orders;
-- MySQL
SELECT YEAR(created_at) FROM orders;
-- SQLite
SELECT strftime('%Y', created_at) FROM orders;
UPSERT (Insert or Update)
-- PostgreSQL
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;
-- MySQL
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON DUPLICATE KEY UPDATE name = VALUES(name);
-- SQLite (3.24+)
INSERT INTO users (email, name)
VALUES ('test@example.com', 'Test User')
ON CONFLICT (email) DO UPDATE SET name = excluded.name;
Pagination
-- Standard (PostgreSQL, MySQL, SQLite)
SELECT * FROM products ORDER BY id LIMIT 10 OFFSET 20;
-- Keyset pagination (more efficient for large offsets)
SELECT * FROM products
WHERE id > 1000 -- Last seen ID
ORDER BY id
LIMIT 10;
Boolean Handling
-- PostgreSQL: Native BOOLEAN
SELECT * FROM users WHERE is_active = true;
-- MySQL: TINYINT(1) or BOOLEAN (alias)
SELECT * FROM users WHERE is_active = 1;
-- SQLite: INTEGER (0/1)
SELECT * FROM users WHERE is_active = 1;
SQL for Conversion
When converting SQL to DataFrame operations (Pandas, Polars), map these patterns:
| SQL | DataFrame Equivalent |
|---|---|
SELECT col1, col2 |
.select(["col1", "col2"]) |
WHERE condition |
.filter(condition) |
ORDER BY col DESC |
.sort("col", descending=True) |
LIMIT n |
.head(n) or .limit(n) |
GROUP BY |
.group_by() |
JOIN |
.join() |
DISTINCT |
.unique() or .distinct() |
See derivative skills for specific conversion patterns:
sql-to-polars- SQL to Polars DataFramesql-to-pandas- SQL to Pandas DataFrame
Anti-Patterns to Avoid
1. SELECT *
-- Bad: Fetches unnecessary data
SELECT * FROM users;
-- Good: Only fetch needed columns
SELECT id, name, email FROM users;
2. N+1 Queries
-- Bad: Query per user (in application loop)
SELECT * FROM orders WHERE user_id = 1;
SELECT * FROM orders WHERE user_id = 2;
-- ... repeated N times
-- Good: Single query with JOIN or IN
SELECT * FROM orders WHERE user_id IN (1, 2, 3, ...);
3. Functions on Indexed Columns
-- Bad: Prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- Good: Store normalized, or use expression index
SELECT * FROM users WHERE email = 'test@example.com';
4. Implicit Type Conversion
-- Bad: String compared to integer
SELECT * FROM users WHERE id = '123';
-- Good: Matching types
SELECT * FROM users WHERE id = 123;
5. Missing WHERE on UPDATE/DELETE
-- DANGEROUS: Affects all rows!
UPDATE users SET status = 'inactive';
-- Safe: Always include WHERE
UPDATE users SET status = 'inactive' WHERE last_login < '2024-01-01';
Troubleshooting
Query Returns No Results
- Check WHERE conditions: Test each condition separately
- Verify JOIN keys: Ensure matching data types and values
- NULL handling: Use
IS NULLnot= NULL - Case sensitivity: Check collation settings
Query Too Slow
- Run EXPLAIN: Look for Seq Scans on large tables
- Check indexes: Are filtered/joined columns indexed?
- Reduce data early: Filter before JOINing
- **Avoid SELECT ***: Fetch only needed columns
Unexpected Duplicates
- Missing DISTINCT: Add if needed
- Many-to-many JOINs: Each match creates a row
- GROUP BY missing columns: All non-aggregated columns must be grouped
NULL Surprises
-- NULL comparisons always return NULL (unknown)
SELECT * FROM users WHERE department = NULL; -- Returns nothing!
SELECT * FROM users WHERE department IS NULL; -- Correct
-- NULL in aggregations
SELECT AVG(salary) FROM employees; -- NULLs ignored
SELECT COUNT(*) vs COUNT(column); -- COUNT(*) includes NULL rows
References
- PostgreSQL Documentation
- MySQL Reference Manual
- SQLite Documentation
sql-optimization-patterns- Deep performance optimizationsql-expert- Advanced query patterns