name	SQL Ecosystem
description	This skill should be used when working with SQL databases, "SELECT", "INSERT", "UPDATE", "DELETE", "CREATE TABLE", "JOIN", "INDEX", "EXPLAIN", transactions, or database migrations. Provides comprehensive SQL patterns across PostgreSQL, MySQL, and SQLite.

Provide comprehensive patterns for SQL database operations, schema design, query optimization, transaction management, and migrations across ANSI SQL standard with database-specific notes. Use parameterized queries for ALL user input - NEVER use string concatenation Create indexes on foreign key columns Use explicit transaction boundaries for multi-statement operations Escape wildcards in LIKE patterns when using user input Analyze query plans with EXPLAIN before optimizing Use appropriate isolation levels for transaction requirements Implement soft deletes for audit trails Name constraints explicitly for easier migration management ANSI SQL standard data types supported across major databases -- Numeric types INTEGER, SMALLINT, BIGINT DECIMAL(precision, scale), NUMERIC(precision, scale) REAL, DOUBLE PRECISION

-- String types CHAR(n), VARCHAR(n), TEXT

-- Date/Time types DATE, TIME, TIMESTAMP, INTERVAL

-- Boolean BOOLEAN

Useful types specific to each database -- PostgreSQL specific UUID, JSONB, ARRAY, INET, CIDR, MACADDR SERIAL, BIGSERIAL (auto-increment) TSTZRANGE, DATERANGE (range types)

-- MySQL specific TINYINT, MEDIUMINT ENUM('value1', 'value2'), SET('a', 'b', 'c') JSON (stored as text internally)

-- SQLite specific -- Uses type affinity: TEXT, INTEGER, REAL, BLOB, NULL -- Any type name accepted but mapped to affinity Prefer ANSI types for portability; use DB-specific types when features are needed

Table creation with constraints CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL UNIQUE, name VARCHAR(100) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

CONSTRAINT email_format CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')

);

CREATE TABLE orders ( id SERIAL PRIMARY KEY, user_id INTEGER NOT NULL, total DECIMAL(10, 2) NOT NULL, status VARCHAR(20) DEFAULT 'pending',

CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE

);

Safe table modification patterns -- Add column (safe) ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Add column with default (PostgreSQL 11+ is instant) ALTER TABLE users ADD COLUMN active BOOLEAN DEFAULT true;

-- Rename column ALTER TABLE users RENAME COLUMN name TO full_name;

-- Add constraint ALTER TABLE users ADD CONSTRAINT unique_phone UNIQUE (phone);

-- Drop constraint ALTER TABLE users DROP CONSTRAINT unique_phone;

Index creation patterns -- B-tree index (default, most common) CREATE INDEX idx_users_email ON users(email);

-- Unique index CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Composite index (order matters for query optimization) CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Partial index (PostgreSQL) CREATE INDEX idx_active_users ON users(email) WHERE active = true;

-- Expression index (PostgreSQL) CREATE INDEX idx_users_lower_email ON users(LOWER(email)); What type of queries will use this index? B-tree (default) B-tree GIN with tsvector (PostgreSQL) GIN (PostgreSQL) GiST (PostgreSQL)

Query patterns for data retrieval -- Basic select with filtering SELECT id, email, name FROM users WHERE active = true ORDER BY created_at DESC LIMIT 10 OFFSET 0;

-- Aggregate with grouping SELECT status, COUNT() as count, SUM(total) as revenue FROM orders WHERE created_at >= '2024-01-01' GROUP BY status HAVING COUNT() > 10 ORDER BY revenue DESC;

Data insertion patterns -- Single insert INSERT INTO users (email, name) VALUES ('user@example.com', 'John Doe');

-- Multi-row insert INSERT INTO users (email, name) VALUES ('user1@example.com', 'User One'), ('user2@example.com', 'User Two'), ('user3@example.com', 'User Three');

-- Insert with returning (PostgreSQL) INSERT INTO users (email, name) VALUES ('new@example.com', 'New User') RETURNING id, created_at;

-- Upsert (PostgreSQL) INSERT INTO users (email, name) VALUES ('user@example.com', 'Updated Name') ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name;

-- Upsert (MySQL) INSERT INTO users (email, name) VALUES ('user@example.com', 'Updated Name') ON DUPLICATE KEY UPDATE name = VALUES(name);

Data modification patterns -- Basic update UPDATE users SET name = 'New Name' WHERE id = 1;

-- Update with subquery UPDATE orders SET status = 'cancelled' WHERE user_id IN (SELECT id FROM users WHERE active = false);

-- Update with join (PostgreSQL) UPDATE orders o SET status = 'vip' FROM users u WHERE o.user_id = u.id AND u.vip = true;

-- Update with returning (PostgreSQL) UPDATE users SET active = false WHERE id = 1 RETURNING *;

Data removal patterns -- Basic delete DELETE FROM users WHERE id = 1;

-- Delete with subquery DELETE FROM orders WHERE user_id IN (SELECT id FROM users WHERE active = false);

-- Truncate (faster for all rows, resets sequences) TRUNCATE TABLE logs; TRUNCATE TABLE logs RESTART IDENTITY; -- PostgreSQL

-- Soft delete pattern (prefer this) UPDATE users SET deleted_at = NOW() WHERE id = 1; Prefer soft deletes for audit trails; use hard deletes only for GDPR/compliance

Safe query construction preventing SQL injection - ALWAYS use for user input -- PostgreSQL with psycopg2/psycopg3 (Python) cursor.execute( "SELECT * FROM users WHERE email = %s AND status = %s", (user_email, status) )

-- PostgreSQL with pg (Node.js) client.query( 'SELECT * FROM users WHERE email = $1 AND status = $2', [userEmail, status] )

-- MySQL with mysql-connector (Python) cursor.execute( "SELECT * FROM users WHERE email = %s AND status = %s", (user_email, status) )

-- MySQL with mysql2 (Node.js) connection.execute( 'SELECT * FROM users WHERE email = ? AND status = ?', [userEmail, status] )

-- SQLite with sqlite3 (Python) cursor.execute( "SELECT * FROM users WHERE email = ? AND status = ?", (user_email, status) )

-- Go with database/sql db.Query( "SELECT _ FROM users WHERE email = $1 AND status = $2", userEmail, status ) NEVER use string concatenation or template literals with user input - this enables SQL injection attacks -- DANGEROUS: SQL injection vulnerability query = "SELECT _ FROM users WHERE email = '" + user_input + "'" query = f"SELECT * FROM users WHERE email = '{user_input}'"

-- If user_input = "'; DROP TABLE users; --" -- Executes: SELECT * FROM users WHERE email = ''; DROP TABLE users; --'

Prevent pattern injection in LIKE queries with user input -- VULNERABLE: User can inject wildcards -- If user_input = "%", this returns ALL records SELECT * FROM products WHERE name LIKE '%' || user_input || '%';

-- SAFE: Escape wildcards before using in LIKE -- Python: escaped = userinput.replace('%', '\%').replace('', '\_') -- Then use parameterized query: cursor.execute( "SELECT * FROM products WHERE name LIKE %s", ('%' + escaped_input + '%',) )

-- PostgreSQL: Use ESCAPE clause explicitly SELECT * FROM products WHERE name LIKE '%' || $1 || '%' ESCAPE '';

-- Alternative: Use position() or strpos() for exact matching SELECT * FROM products WHERE position($1 in name) > 0; Wildcards % and _ in user input can bypass intended restrictions

Safe dynamic SQL construction with whitelisting for identifiers -- DANGEROUS: Identifier injection query = f"SELECT {column_name} FROM {table_name}"

-- SAFE: Whitelist allowed values (Python example) ALLOWED_COLUMNS = {'id', 'name', 'email', 'created_at'} ALLOWED_TABLES = {'users', 'products', 'orders'}

if column_name not in ALLOWED_COLUMNS: raise ValueError(f"Invalid column: {column_name}") if table_name not in ALLOWED_TABLES: raise ValueError(f"Invalid table: {table_name}")

-- PostgreSQL: Use quote_ident() for identifiers SELECT quote_ident($1) FROM quote_ident($2);

-- Use identifier quoting as additional protection query = f'SELECT "{column_name}" FROM "{table_name}"' Never use user input directly for table/column names; always validate against whitelist

Database constraint patterns for data integrity -- Primary Key PRIMARY KEY (id) PRIMARY KEY (user_id, product_id) -- composite

-- Foreign Key FOREIGN KEY (user_id) REFERENCES users(id) FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL

-- Unique UNIQUE (email) UNIQUE (user_id, product_id) -- composite unique

-- Check CHECK (price > 0) CHECK (status IN ('pending', 'active', 'completed'))

-- Not Null NOT NULL

-- Default DEFAULT CURRENT_TIMESTAMP DEFAULT 'pending'

Return only matching rows from both tables SELECT u.name, o.total FROM users u INNER JOIN orders o ON u.id = o.user_id WHERE o.status = 'completed'; When you need data from both tables and only care about matches Return all rows from left table, matching rows from right SELECT u.name, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id, u.name; When you need all rows from primary table even without matches Return all rows from right table, matching rows from left SELECT u.name, o.total FROM users u RIGHT JOIN orders o ON u.id = o.user_id; Often rewritten as LEFT JOIN by swapping table order for clarity Return all rows from both tables SELECT u.name, o.total FROM users u FULL OUTER JOIN orders o ON u.id = o.user_id; Not supported in MySQL; use UNION of LEFT and RIGHT JOINs Cartesian product of two tables SELECT u.name, p.name as product FROM users u CROSS JOIN products p; Produces M*N rows; use carefully with large tables Join table with itself -- Find employees and their managers SELECT e.name as employee, m.name as manager FROM employees e LEFT JOIN employees m ON e.manager_id = m.id; Subquery returning single value SELECT name, (SELECT AVG(total) FROM orders) as avg_order_total FROM users; Filter using subquery results SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE total > 1000); Check for existence of related records -- More efficient than IN for large datasets SELECT * FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id AND o.total > 1000 ); EXISTS stops at first match; more efficient than IN for existence checks Subquery referencing outer query SELECT u.name, (SELECT MAX(o.total) FROM orders o WHERE o.user_id = u.id) as max_order FROM users u; Executes once per outer row; consider JOIN for performance Subquery in FROM clause SELECT user_stats.name, user_stats.total_spent FROM ( SELECT u.name, SUM(o.total) as total_spent FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.id, u.name ) AS user_stats WHERE user_stats.total_spent > 10000; Common Table Expression for readable queries WITH active_users AS ( SELECT id, name, email FROM users WHERE active = true ) SELECT au.name, COUNT(o.id) as order_count FROM active_users au LEFT JOIN orders o ON au.id = o.user_id GROUP BY au.id, au.name; CTEs improve readability; some DBs materialize them (performance consideration) Chain multiple CTEs WITH active_users AS ( SELECT id, name FROM users WHERE active = true ), user_orders AS ( SELECT user_id, SUM(total) as total_spent FROM orders GROUP BY user_id ) SELECT au.name, COALESCE(uo.total_spent, 0) as total_spent FROM active_users au LEFT JOIN user_orders uo ON au.id = uo.user_id ORDER BY total_spent DESC; Recursive query for hierarchical data -- Traverse org hierarchy WITH RECURSIVE org_tree AS ( -- Base case: top-level managers SELECT id, name, manager_id, 1 as level FROM employees WHERE manager_id IS NULL

UNION ALL

-- Recursive case: subordinates
SELECT e.id, e.name, e.manager_id, ot.level + 1
FROM employees e
INNER JOIN org_tree ot ON e.manager_id = ot.id

) SELECT * FROM org_tree ORDER BY level, name; Tree structures, bill of materials, path finding

Assign unique sequential numbers SELECT name, total, ROW_NUMBER() OVER (ORDER BY total DESC) as rank FROM orders;

-- Partition by user SELECT user_id, total, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as order_num FROM orders; Pagination, deduplication, ranking

Ranking with tie handling SELECT name, score, RANK() OVER (ORDER BY score DESC) as rank, -- gaps after ties DENSE_RANK() OVER (ORDER BY score DESC) as dense -- no gaps FROM players; -- Score 100: RANK=1, DENSE_RANK=1 -- Score 100: RANK=1, DENSE_RANK=1 -- Score 90: RANK=3, DENSE_RANK=2 Access adjacent rows SELECT date, revenue, LAG(revenue, 1) OVER (ORDER BY date) as prev_day_revenue, LEAD(revenue, 1) OVER (ORDER BY date) as next_day_revenue, revenue - LAG(revenue, 1) OVER (ORDER BY date) as daily_change FROM daily_sales; Time series analysis, trend detection Cumulative calculations SELECT date, revenue, SUM(revenue) OVER (ORDER BY date) as cumulative_revenue, AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg_7d FROM daily_sales; Get first/last values in window SELECT user_id, order_date, total, FIRST_VALUE(total) OVER (PARTITION BY user_id ORDER BY order_date) as first_order, LAST_VALUE(total) OVER ( PARTITION BY user_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ) as last_order FROM orders; LAST_VALUE requires explicit frame; default frame ends at current row Divide rows into buckets SELECT name, score, NTILE(4) OVER (ORDER BY score DESC) as quartile FROM students; Percentile analysis, distribution grouping Aggregate data by groups SELECT status, COUNT(*) as count, SUM(total) as sum, AVG(total) as avg, MIN(total) as min, MAX(total) as max FROM orders GROUP BY status; Filter aggregated results SELECT user_id, COUNT(*) as order_count FROM orders GROUP BY user_id HAVING COUNT(*) >= 5; HAVING filters after aggregation; WHERE filters before Multiple grouping levels in single query (PostgreSQL, MySQL 8+) SELECT COALESCE(category, 'All Categories') as category, COALESCE(region, 'All Regions') as region, SUM(sales) as total_sales FROM sales_data GROUP BY GROUPING SETS ( (category, region), (category), (region), () ); Hierarchical aggregation SELECT year, quarter, SUM(revenue) as revenue FROM sales GROUP BY ROLLUP (year, quarter); -- Produces: (year, quarter), (year), () First Normal Form: Atomic values, no repeating groups -- Violation: comma-separated values CREATE TABLE bad_orders ( id INTEGER, products TEXT -- 'apple,banana,orange' );

-- 1NF compliant: separate rows CREATE TABLE order_items ( order_id INTEGER, product_id INTEGER, PRIMARY KEY (order_id, product_id) );

Second Normal Form: No partial dependencies on composite key -- Violation: product_name depends only on product_id CREATE TABLE bad_order_items ( order_id INTEGER, product_id INTEGER, product_name TEXT, -- partial dependency quantity INTEGER, PRIMARY KEY (order_id, product_id) );

-- 2NF compliant: separate product table CREATE TABLE products ( id INTEGER PRIMARY KEY, name TEXT ); CREATE TABLE order_items ( order_id INTEGER, product_id INTEGER REFERENCES products(id), quantity INTEGER, PRIMARY KEY (order_id, product_id) );

Third Normal Form: No transitive dependencies -- Violation: city depends on zip_code, not directly on user CREATE TABLE bad_users ( id INTEGER PRIMARY KEY, name TEXT, zip_code TEXT, city TEXT -- transitive: user -> zip_code -> city );

-- 3NF compliant: separate locations CREATE TABLE locations ( zip_code TEXT PRIMARY KEY, city TEXT ); CREATE TABLE users ( id INTEGER PRIMARY KEY, name TEXT, zip_code TEXT REFERENCES locations(zip_code) );

What are the priority requirements? Normalize to 3NF Consider denormalization for hot paths Normalize fully Consider star schema denormalization Use auto-generated IDs as primary keys CREATE TABLE users ( id SERIAL PRIMARY KEY, -- PostgreSQL -- id INT AUTO_INCREMENT PRIMARY KEY, -- MySQL email VARCHAR(255) UNIQUE NOT NULL ); Prefer surrogate keys for stability; natural keys can change Mark records as deleted instead of removing CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL, deleted_at TIMESTAMP NULL,

CONSTRAINT unique_active_email UNIQUE (email) WHERE deleted_at IS NULL

);

-- Query active records SELECT * FROM users WHERE deleted_at IS NULL; Audit trails, data recovery, compliance

Track record creation and modification CREATE TABLE orders ( id SERIAL PRIMARY KEY, -- business columns... created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, created_by INTEGER REFERENCES users(id), updated_by INTEGER REFERENCES users(id) );

-- Auto-update trigger (PostgreSQL) CREATE OR REPLACE FUNCTION update_updated_at() RETURNS TRIGGER AS $$ BEGIN NEW.updated_at = CURRENT_TIMESTAMP; RETURN NEW; END;

$$ LANGUAGE plpgsql;

CREATE TRIGGER orders_updated_at BEFORE UPDATE ON orders FOR EACH ROW EXECUTE FUNCTION update_updated_at();

Single table references multiple entity types -- Comments can belong to posts or videos CREATE TABLE comments ( id SERIAL PRIMARY KEY, content TEXT, commentable_type VARCHAR(50) NOT NULL, -- 'post' or 'video' commentable_id INTEGER NOT NULL,

INDEX idx_commentable (commentable_type, commentable_id)

); Cannot enforce FK constraint; validate at application level

Reference table for enumerated values CREATE TABLE order_statuses ( id SERIAL PRIMARY KEY, name VARCHAR(50) UNIQUE NOT NULL, description TEXT );

INSERT INTO order_statuses (name) VALUES ('pending'), ('processing'), ('shipped'), ('delivered'), ('cancelled');

CREATE TABLE orders ( id SERIAL PRIMARY KEY, status_id INTEGER REFERENCES order_statuses(id) ); Prefer over ENUM for flexibility; easier to add/modify values

Many-to-many relationship CREATE TABLE users (id SERIAL PRIMARY KEY, name TEXT); CREATE TABLE roles (id SERIAL PRIMARY KEY, name TEXT);

CREATE TABLE user_roles ( user_id INTEGER REFERENCES users(id) ON DELETE CASCADE, role_id INTEGER REFERENCES roles(id) ON DELETE CASCADE, granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (user_id, role_id) );

Understand query execution plans -- PostgreSQL EXPLAIN SELECT * FROM users WHERE email = 'test@example.com'; EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com'; EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT ...;

-- MySQL EXPLAIN SELECT * FROM users WHERE email = 'test@example.com'; EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com'; -- MySQL 8.0.18+

-- SQLite EXPLAIN QUERY PLAN SELECT * FROM users WHERE email = 'test@example.com';

Important EXPLAIN output indicators -- PostgreSQL EXPLAIN output interpretation Seq Scan -- Full table scan (often bad for large tables) Index Scan -- Using index (good) Index Only Scan -- Covering index, no heap access (best) Bitmap Scan -- Multiple index conditions combined Nested Loop -- Join method for small datasets Hash Join -- Join method for larger datasets Merge Join -- Join method for sorted data

-- Key metrics to watch cost=startup..total -- Estimated cost units rows=N -- Estimated row count actual time=X..Y -- Real execution time (with ANALYZE) loops=N -- Number of iterations

Index contains all columns needed by query -- Query only needs email and name SELECT email, name FROM users WHERE email LIKE 'a%';

-- Covering index avoids table lookup CREATE INDEX idx_users_email_name ON users(email, name);

Order columns by selectivity and query patterns -- Query: WHERE status = ? AND user_id = ? -- If status has few values (low cardinality), put user_id first CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Leftmost prefix rule: this index supports: -- WHERE user_id = ? -- WHERE user_id = ? AND status = ? -- But NOT: WHERE status = ?

Index subset of rows (PostgreSQL) -- Only index active users CREATE INDEX idx_active_users ON users(email) WHERE active = true;

-- Only index recent orders CREATE INDEX idx_recent_orders ON orders(created_at) WHERE created_at > '2024-01-01'; When queries always filter by same condition

Select only needed columns -- Bad: fetches all columns SELECT * FROM users WHERE id = 1;

-- Good: only needed columns SELECT id, name, email FROM users WHERE id = 1;

EXISTS is more efficient for existence checks -- Bad: counts all matching rows SELECT CASE WHEN COUNT(*) > 0 THEN true ELSE false END FROM orders WHERE user_id = 1;

-- Good: stops at first match SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = 1);

Batch inserts and updates for better performance -- Bad: individual inserts INSERT INTO logs (message) VALUES ('log1'); INSERT INTO logs (message) VALUES ('log2'); INSERT INTO logs (message) VALUES ('log3');

-- Good: batch insert INSERT INTO logs (message) VALUES ('log1'), ('log2'), ('log3');

-- Good: batch update with CASE UPDATE products SET price = CASE id WHEN 1 THEN 10.00 WHEN 2 THEN 20.00 WHEN 3 THEN 30.00 END WHERE id IN (1, 2, 3);

Efficient pagination patterns -- Offset pagination (simple but slow for large offsets) SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 1000;

-- Keyset pagination (efficient for large datasets) SELECT * FROM orders WHERE id > 1000 -- last seen ID ORDER BY id LIMIT 20;

-- Cursor-based with composite key SELECT * FROM orders WHERE (created_at, id) > ('2024-01-01', 1000) ORDER BY created_at, id LIMIT 20; Keyset pagination is O(1); offset pagination is O(n)

OR conditions on different columns prevent index usage -- Bad: can't use single index efficiently SELECT * FROM users WHERE email = 'a@b.com' OR name = 'John';

-- Good: UNION allows index usage on each condition SELECT * FROM users WHERE email = 'a@b.com' UNION SELECT * FROM users WHERE name = 'John';

All operations succeed or all fail BEGIN; UPDATE accounts SET balance = balance - 100 WHERE id = 1; UPDATE accounts SET balance = balance + 100 WHERE id = 2; COMMIT; -- Both succeed or neither Database remains in valid state after transaction -- Constraints ensure consistency ALTER TABLE accounts ADD CONSTRAINT positive_balance CHECK (balance >= 0);

-- Transaction fails if constraint violated BEGIN; UPDATE accounts SET balance = balance - 1000 WHERE id = 1; -- Fails if balance < 1000 COMMIT;

Concurrent transactions don't interfere -- Set isolation level SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN; -- Protected from concurrent modifications SELECT balance FROM accounts WHERE id = 1; UPDATE accounts SET balance = balance - 100 WHERE id = 1; COMMIT;

Committed changes persist even after crashes Handled by database engine through WAL (Write-Ahead Logging) Lowest isolation; can read uncommitted changes Dirty reads, non-repeatable reads, phantom reads Rarely used; only for approximate counts/analytics Default in PostgreSQL; only reads committed data Non-repeatable reads, phantom reads Most OLTP applications SET TRANSACTION ISOLATION LEVEL READ COMMITTED; Default in MySQL; consistent reads within transaction Phantom reads (in standard SQL; PostgreSQL prevents these) Financial transactions, reporting SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; Highest isolation; transactions appear sequential Lower concurrency, potential deadlocks Critical financial operations, inventory management SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; What is the consistency requirement? READ UNCOMMITTED READ COMMITTED (default) REPEATABLE READ SERIALIZABLE Lock specific rows for update -- PostgreSQL/MySQL BEGIN; SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- Row is locked until COMMIT UPDATE accounts SET balance = balance - 100 WHERE id = 1; COMMIT;

-- NOWAIT: fail immediately if locked SELECT * FROM accounts WHERE id = 1 FOR UPDATE NOWAIT;

-- SKIP LOCKED: skip locked rows (queue processing) SELECT * FROM jobs WHERE status = 'pending' FOR UPDATE SKIP LOCKED LIMIT 1;

Application-level locks (PostgreSQL) -- Session-level lock SELECT pg_advisory_lock(12345); -- Do work... SELECT pg_advisory_unlock(12345);

-- Transaction-level lock (auto-released on commit) SELECT pg_advisory_xact_lock(12345);

-- Try lock (non-blocking) SELECT pg_try_advisory_lock(12345); -- returns true/false Distributed locks, rate limiting, singleton processes

Detect conflicts using version column -- Add version column ALTER TABLE products ADD COLUMN version INTEGER DEFAULT 0;

-- Read with version SELECT id, name, price, version FROM products WHERE id = 1; -- version = 5

-- Update with version check UPDATE products SET price = 29.99, version = version + 1 WHERE id = 1 AND version = 5;

-- If rows affected = 0, conflict occurred -> retry or error Low-contention updates, web applications

Lock before reading to prevent conflicts BEGIN; SELECT * FROM inventory WHERE product_id = 1 FOR UPDATE; -- Check quantity UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 1; COMMIT; High-contention updates, inventory management Always acquire locks in same order -- Always lock lower ID first BEGIN; SELECT * FROM accounts WHERE id = 1 FOR UPDATE; SELECT * FROM accounts WHERE id = 2 FOR UPDATE; -- Transfer... COMMIT; Set maximum wait time for locks -- PostgreSQL SET lock_timeout = '5s';

-- MySQL SET innodb_lock_wait_timeout = 5;

Handle deadlock with retry logic -- Application code pattern (pseudocode) max_retries = 3 for attempt in range(max_retries): try: execute_transaction() break except DeadlockError: if attempt == max_retries - 1: raise sleep(random_backoff()) Migration file naming conventions -- Timestamp-based (recommended) 20240115120000_create_users_table.sql 20240115120100_add_email_to_users.sql

-- Sequential 001_create_users_table.sql 002_add_email_to_users.sql Timestamp-based prevents conflicts in team environments

Include rollback capability -- 20240115120000_create_users_table.sql

-- +migrate Up CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL );

-- +migrate Down DROP TABLE users;

Migrations that can run multiple times safely -- Use IF NOT EXISTS / IF EXISTS CREATE TABLE IF NOT EXISTS users ( id SERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL );

CREATE INDEX IF NOT EXISTS idx_users_email ON users(email);

ALTER TABLE users ADD COLUMN IF NOT EXISTS name VARCHAR(100);

Add nullable column first, then populate -- Step 1: Add nullable column (instant in PostgreSQL 11+) ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Step 2: Backfill data (in batches) UPDATE users SET phone = '' WHERE phone IS NULL AND id BETWEEN 1 AND 1000;

-- Step 3: Add NOT NULL constraint ALTER TABLE users ALTER COLUMN phone SET NOT NULL;

Add column with default (instant in PostgreSQL 11+) -- PostgreSQL 11+: instant, no table rewrite ALTER TABLE users ADD COLUMN active BOOLEAN DEFAULT true NOT NULL;

-- Older versions: requires table rewrite -- Use nullable + backfill + NOT NULL pattern instead

Multi-step column rename for zero downtime -- Step 1: Add new column ALTER TABLE users ADD COLUMN full_name VARCHAR(100);

-- Step 2: Copy data (in batches) UPDATE users SET full_name = name WHERE full_name IS NULL;

-- Step 3: Deploy code reading both columns

-- Step 4: Deploy code writing to both columns

-- Step 5: Deploy code reading only new column

-- Step 6: Drop old column ALTER TABLE users DROP COLUMN name;

Create index without locking table -- PostgreSQL: CONCURRENTLY prevents locking CREATE INDEX CONCURRENTLY idx_users_email ON users(email);

-- Note: Cannot run inside transaction -- May take longer but allows concurrent reads/writes CONCURRENTLY can fail; check index is valid after creation

Remove column without breaking application -- Step 1: Stop writing to column in application

-- Step 2: Deploy and wait for old code to drain

-- Step 3: Drop column ALTER TABLE users DROP COLUMN old_column;

Process large datasets in chunks -- Process in batches of 1000 DO $$

DECLARE batch_size INTEGER := 1000; rows_updated INTEGER; BEGIN LOOP UPDATE users SET email_normalized = LOWER(email) WHERE email_normalized IS NULL AND id IN ( SELECT id FROM users WHERE email_normalized IS NULL LIMIT batch_size );

    GET DIAGNOSTICS rows_updated = ROW_COUNT;
    EXIT WHEN rows_updated = 0;

    COMMIT;
    PERFORM pg_sleep(0.1);  -- Reduce load
END LOOP;

END $$;

Use cursor for very large tables DECLARE batch_cursor CURSOR FOR SELECT id FROM users WHERE new_column IS NULL;

FETCH 1000 FROM batch_cursor; -- Process batch -- Repeat until no more rows

Use Context7 MCP for up-to-date SQL documentation resolve-library-id libraryName="postgresql" get-library-docs context7CompatibleLibraryID="/websites/postgresql" topic="window functions" resolve-library-id libraryName="mysql" get-library-docs context7CompatibleLibraryID="/websites/dev_mysql_doc_refman_9_4_en" topic="JSON functions" resolve-library-id libraryName="sqlite" get-library-docs context7CompatibleLibraryID="/sqlite/sqlite" topic="query optimization" Using SELECT \* in production queries Explicitly list required columns for performance and clarity Querying without appropriate indexes on filter/join columns Create indexes on columns used in WHERE, JOIN, ORDER BY Executing N+1 queries in a loop -- Bad: N+1 queries for user in users: orders = query("SELECT * FROM orders WHERE user_id = ?", user.id) Use JOIN or IN clause to fetch all data in single query Building SQL with string concatenation (SQL injection risk) Use parameterized queries/prepared statements Comparing columns with mismatched types -- Bad: string comparison prevents index usage SELECT * FROM users WHERE id = '123'; Use correct types; cast explicitly if needed Accidental cross joins from missing join conditions -- Bad: missing ON clause SELECT * FROM users, orders; Always use explicit JOIN with ON clause Excessive normalization causing too many joins Denormalize for read-heavy queries; balance with write complexity Use parameterized queries to prevent SQL injection Create indexes on foreign keys and frequently filtered columns Use transactions for multi-statement operations Analyze query plans with EXPLAIN before optimizing Use appropriate isolation levels for transaction requirements Implement soft deletes for audit trails Name constraints explicitly for easier migration management Prefer keyset pagination over offset for large datasets Use CTEs for complex query readability Batch large data modifications to reduce lock contention Test migrations on production-like data before deployment Understand database requirements 1. Identify data model and relationships 2. Determine query patterns and access frequency 3. Review existing schema and indexes Write efficient SQL 1. Design normalized schema (3NF baseline) 2. Write queries with appropriate indexes 3. Use transactions for data integrity Verify SQL correctness and performance 1. Analyze with EXPLAIN 2. Test with production-like data volume 3. Verify transaction isolation Missing index on infrequently queried column Note for future optimization, proceed Query performance degradation under load Analyze EXPLAIN output, propose index or query optimization Deadlock or lock timeout in production Stop, analyze lock patterns, present resolution options Data corruption or SQL injection vulnerability Block operation, require immediate remediation Use parameterized queries for all user input Create indexes on foreign key columns Use explicit transaction boundaries for multi-statement operations Test migrations on non-production environment first SELECT * in production queries String concatenation for SQL construction Long-running transactions holding locks Offset pagination for large datasets Database design, ER diagrams, migration planning Query optimization, index analysis SQL injection prevention, access control Navigate database schema and find query patterns Fetch PostgreSQL, MySQL, SQLite documentation Debug query performance issues

SQL Ecosystem

Install Skill

SKILL.md