name	database-management-patterns
description	Comprehensive guide for database management patterns covering PostgreSQL and MongoDB including schema design, indexing, transactions, replication, and performance tuning
tags	database, postgresql, mongodb, sql, nosql, indexing, transactions, replication, sharding, performance, schema-design
tier	tier-1

Database Management Patterns

A comprehensive skill for mastering database management across SQL (PostgreSQL) and NoSQL (MongoDB) systems. This skill covers schema design, indexing strategies, transaction management, replication, sharding, and performance optimization for production-grade applications.

When to Use This Skill

Use this skill when:

Designing database schemas for new applications or refactoring existing ones
Choosing between SQL and NoSQL databases for your use case
Optimizing query performance with proper indexing strategies
Implementing data consistency with transactions and ACID guarantees
Scaling databases horizontally with sharding and replication
Managing high-traffic applications requiring distributed databases
Ensuring data integrity with constraints, triggers, and validation
Troubleshooting performance issues using explain plans and query analysis
Building fault-tolerant systems with replication and failover strategies
Working with complex data relationships (relational) or flexible schemas (document)

Core Concepts

Database Paradigms Comparison

Relational Databases (PostgreSQL)

Strengths:

ACID Transactions: Strong consistency guarantees
Complex Queries: JOIN operations, subqueries, CTEs
Data Integrity: Foreign keys, constraints, triggers
Normalized Data: Reduced redundancy, consistent updates
Mature Ecosystem: Rich tooling, extensions, community

Best For:

Financial systems requiring strict consistency
Complex relationships and data integrity requirements
Applications with structured, well-defined schemas
Systems requiring complex analytical queries
Multi-step transactions across multiple tables

Document Databases (MongoDB)

Strengths:

Flexible Schema: Easy schema evolution, polymorphic data
Horizontal Scalability: Built-in sharding support
JSON-Native: Natural fit for modern application development
Embedded Documents: Denormalized data for performance
Aggregation Framework: Powerful data processing pipeline

Best For:

Rapidly evolving applications with changing requirements
Content management systems with varied data structures
Real-time analytics and event logging
Mobile and web applications with JSON APIs
Hierarchical or nested data structures

ACID Properties

Atomicity: All operations in a transaction succeed or fail together Consistency: Transactions bring database from one valid state to another Isolation: Concurrent transactions don't interfere with each other Durability: Committed transactions survive system failures

CAP Theorem

In distributed systems, choose two of three:

Consistency: All nodes see the same data
Availability: System remains operational
Partition Tolerance: System continues despite network failures

PostgreSQL emphasizes CP (Consistency + Partition Tolerance) MongoDB can be configured for CP or AP depending on write/read concerns

PostgreSQL Patterns

Schema Design Fundamentals

Normalization Levels

First Normal Form (1NF)

Atomic values (no arrays or lists in columns)
Each row is unique (primary key exists)
No repeating groups

Second Normal Form (2NF)

Meets 1NF requirements
All non-key attributes depend on the entire primary key

Third Normal Form (3NF)

Meets 2NF requirements
No transitive dependencies (non-key attributes depend only on primary key)

When to Denormalize:

Read-heavy workloads where joins are expensive
Frequently accessed aggregate data
Historical snapshots that shouldn't change
Performance-critical queries

Table Design Patterns

Primary Keys:

-- Serial auto-increment (traditional)
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- UUID for distributed systems
CREATE TABLE accounts (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Composite primary key
CREATE TABLE order_items (
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    price NUMERIC(10, 2) NOT NULL,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);

Foreign Key Constraints:

-- Cascade delete: Remove child records when parent deleted
CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    user_id INTEGER NOT NULL,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

-- Set null: Preserve child records, nullify reference
CREATE TABLE comments (
    id SERIAL PRIMARY KEY,
    post_id INTEGER,
    user_id INTEGER,
    content TEXT NOT NULL,
    FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE SET NULL,
    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE SET NULL
);

-- Restrict: Prevent deletion if child records exist
CREATE TABLE categories (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    category_id INTEGER NOT NULL,
    name VARCHAR(255) NOT NULL,
    FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE RESTRICT
);

Advanced Constraints

Check Constraints:

CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    price NUMERIC(10, 2) NOT NULL CHECK (price > 0),
    discount_percent INTEGER CHECK (discount_percent BETWEEN 0 AND 100),
    stock_quantity INTEGER NOT NULL CHECK (stock_quantity >= 0)
);

-- Table-level check constraint
CREATE TABLE date_ranges (
    id SERIAL PRIMARY KEY,
    start_date DATE NOT NULL,
    end_date DATE NOT NULL,
    CHECK (end_date > start_date)
);

Unique Constraints:

-- Single column unique
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    username VARCHAR(50) UNIQUE NOT NULL
);

-- Composite unique constraint
CREATE TABLE user_permissions (
    user_id INTEGER NOT NULL,
    permission_id INTEGER NOT NULL,
    granted_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE (user_id, permission_id),
    FOREIGN KEY (user_id) REFERENCES users(id),
    FOREIGN KEY (permission_id) REFERENCES permissions(id)
);

-- Partial unique index (unique where condition met)
CREATE UNIQUE INDEX unique_active_email
ON users (email)
WHERE active = true;

Triggers and Functions

Audit Trail Pattern:

-- Audit table
CREATE TABLE audit_log (
    id SERIAL PRIMARY KEY,
    table_name VARCHAR(255) NOT NULL,
    record_id INTEGER NOT NULL,
    action VARCHAR(10) NOT NULL,
    old_data JSONB,
    new_data JSONB,
    changed_by VARCHAR(255),
    changed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Trigger function
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, record_id, action, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'INSERT', row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, new_data, changed_by)
        VALUES (TG_TABLE_NAME, NEW.id, 'UPDATE', row_to_json(OLD), row_to_json(NEW), current_user);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, record_id, action, old_data, changed_by)
        VALUES (TG_TABLE_NAME, OLD.id, 'DELETE', row_to_json(OLD), current_user);
        RETURN OLD;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Attach trigger to table
CREATE TRIGGER users_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();

Timestamp Update Pattern:

CREATE OR REPLACE FUNCTION update_modified_timestamp()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE posts (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    content TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TRIGGER posts_update_timestamp
BEFORE UPDATE ON posts
FOR EACH ROW EXECUTE FUNCTION update_modified_timestamp();

Views and Materialized Views

Standard Views:

-- Virtual table - computed on each query
CREATE VIEW active_users_with_posts AS
SELECT
    u.id,
    u.username,
    u.email,
    COUNT(p.id) as post_count,
    MAX(p.created_at) as last_post_date
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username, u.email;

-- Use view like a table
SELECT * FROM active_users_with_posts WHERE post_count > 10;

Materialized Views:

-- Physical table - stores computed results
CREATE MATERIALIZED VIEW user_statistics AS
SELECT
    u.id,
    u.username,
    COUNT(DISTINCT p.id) as total_posts,
    COUNT(DISTINCT c.id) as total_comments,
    AVG(p.views) as avg_post_views,
    MAX(p.created_at) as last_activity
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
LEFT JOIN comments c ON u.id = c.user_id
GROUP BY u.id, u.username;

-- Create index on materialized view
CREATE INDEX idx_user_stats_posts ON user_statistics(total_posts);

-- Refresh materialized view (update data)
REFRESH MATERIALIZED VIEW user_statistics;

-- Concurrent refresh (allows reads during refresh)
REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;

MongoDB Patterns

Document Modeling Strategies

Embedding vs Referencing

Embedding Pattern (Denormalization):

// One-to-few: Embed when relationship is contained
// Example: Blog post with comments
{
    _id: ObjectId("..."),
    title: "Database Design Patterns",
    author: "John Doe",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comments: [
        {
            _id: ObjectId("..."),
            author: "Jane Smith",
            text: "Great article!",
            created_at: ISODate("2025-01-16")
        },
        {
            _id: ObjectId("..."),
            author: "Bob Johnson",
            text: "Very helpful, thanks!",
            created_at: ISODate("2025-01-17")
        }
    ],
    tags: ["database", "design", "patterns"],
    stats: {
        views: 1523,
        likes: 89,
        shares: 23
    }
}

// Benefits:
// - Single query to retrieve post with comments
// - Better read performance
// - Atomic updates to entire document
//
// Drawbacks:
// - Document size limits (16MB in MongoDB)
// - Difficult to query comments independently
// - May duplicate data if comments need to appear elsewhere

Referencing Pattern (Normalization):

// One-to-many or many-to-many: Reference when relationship is unbounded
// Example: User with many posts

// Users collection
{
    _id: ObjectId("507f1f77bcf86cd799439011"),
    username: "john_doe",
    email: "john@example.com",
    profile: {
        bio: "Software engineer",
        avatar_url: "https://...",
        location: "San Francisco"
    },
    created_at: ISODate("2024-01-01")
}

// Posts collection (references user)
{
    _id: ObjectId("507f191e810c19729de860ea"),
    user_id: ObjectId("507f1f77bcf86cd799439011"),
    title: "My First Post",
    content: "...",
    published_at: ISODate("2025-01-15"),
    comment_ids: [
        ObjectId("..."),
        ObjectId("...")
    ]
}

// Benefits:
// - No duplication of user data
// - Flexible: users can have unlimited posts
// - Easy to update user information once
//
// Drawbacks:
// - Requires multiple queries or $lookup
// - Slower read performance for joined data

Hybrid Approach (Selective Denormalization):

// Store frequently accessed fields from referenced document
{
    _id: ObjectId("..."),
    title: "Database Patterns",
    content: "...",
    author: {
        // Embedded: frequently accessed, rarely changes
        id: ObjectId("507f1f77bcf86cd799439011"),
        username: "john_doe",
        avatar_url: "https://..."
    },
    // Reference: full user data available if needed
    author_id: ObjectId("507f1f77bcf86cd799439011"),
    published_at: ISODate("2025-01-15")
}

// Benefits:
// - Fast reads with embedded frequently-used data
// - Can still get full user data when needed
// - Balance between performance and flexibility
//
// Tradeoffs:
// - Need to update embedded data when user changes username/avatar
// - Slightly larger documents

Schema Design Patterns

Bucket Pattern (Time-Series Data):

// Instead of one document per measurement:
// BAD: Millions of tiny documents
{
    sensor_id: "sensor_001",
    timestamp: ISODate("2025-01-15T10:00:00Z"),
    temperature: 72.5,
    humidity: 45
}

// GOOD: Bucket documents with arrays of measurements
{
    sensor_id: "sensor_001",
    date: ISODate("2025-01-15"),
    hour: 10,
    measurements: [
        { minute: 0, temperature: 72.5, humidity: 45 },
        { minute: 1, temperature: 72.6, humidity: 45 },
        { minute: 2, temperature: 72.4, humidity: 46 },
        // ... up to 60 measurements per hour
    ],
    summary: {
        count: 60,
        avg_temperature: 72.5,
        min_temperature: 71.8,
        max_temperature: 73.2
    }
}

// Benefits:
// - Reduced document count (60x fewer documents)
// - Better index efficiency
// - Pre-computed summaries
// - Easier to query by time ranges

Computed Pattern (Pre-Aggregated Data):

// Store computed values to avoid expensive aggregations
{
    _id: ObjectId("..."),
    product_id: "PROD-123",
    month: "2025-01",
    total_sales: 15420.50,
    units_sold: 234,
    unique_customers: 187,
    avg_order_value: 65.90,
    top_customers: [
        { customer_id: "CUST-456", revenue: 890.50 },
        { customer_id: "CUST-789", revenue: 675.25 }
    ],
    computed_at: ISODate("2025-02-01T00:00:00Z")
}

// Update pattern: Scheduled job or trigger updates computed values

Polymorphic Pattern (Varied Schemas):

// Handle different product types in single collection
{
    _id: ObjectId("..."),
    type: "book",
    name: "Database Design",
    price: 49.99,
    // Book-specific fields
    isbn: "978-0-123456-78-9",
    author: "John Smith",
    pages: 456,
    publisher: "Tech Books Inc"
}

{
    _id: ObjectId("..."),
    type: "electronics",
    name: "Wireless Mouse",
    price: 29.99,
    // Electronics-specific fields
    brand: "TechBrand",
    warranty_months: 24,
    specifications: {
        battery_life: "6 months",
        connectivity: "Bluetooth 5.0"
    }
}

// Query by type
db.products.find({ type: "book", author: "John Smith" })
db.products.find({ type: "electronics", "specifications.connectivity": /Bluetooth/ })

Aggregation Framework

Basic Aggregation Pipeline:

// Group by author and count posts
db.posts.aggregate([
    {
        $match: { published: true } // Filter stage
    },
    {
        $group: {
            _id: "$author_id",
            total_posts: { $sum: 1 },
            total_views: { $sum: "$views" },
            avg_views: { $avg: "$views" },
            latest_post: { $max: "$published_at" }
        }
    },
    {
        $sort: { total_posts: -1 } // Sort by post count
    },
    {
        $limit: 10 // Top 10 authors
    }
])

Advanced Pipeline with Lookup (Join):

// Join posts with user data
db.posts.aggregate([
    {
        $match: {
            published_at: { $gte: ISODate("2025-01-01") }
        }
    },
    {
        $lookup: {
            from: "users",
            localField: "author_id",
            foreignField: "_id",
            as: "author"
        }
    },
    {
        $unwind: "$author" // Flatten author array
    },
    {
        $project: {
            title: 1,
            content: 1,
            views: 1,
            "author.username": 1,
            "author.email": 1,
            days_since_publish: {
                $divide: [
                    { $subtract: [new Date(), "$published_at"] },
                    1000 * 60 * 60 * 24
                ]
            }
        }
    },
    {
        $sort: { views: -1 }
    }
])

Aggregation with Grouping and Reshaping:

// Complex aggregation: Sales analysis
db.orders.aggregate([
    {
        $match: {
            status: "completed",
            created_at: {
                $gte: ISODate("2025-01-01"),
                $lt: ISODate("2025-02-01")
            }
        }
    },
    {
        $unwind: "$items" // Flatten order items
    },
    {
        $group: {
            _id: {
                product_id: "$items.product_id",
                customer_region: "$customer.region"
            },
            total_quantity: { $sum: "$items.quantity" },
            total_revenue: { $sum: "$items.total_price" },
            order_count: { $sum: 1 },
            avg_order_value: { $avg: "$items.total_price" }
        }
    },
    {
        $group: {
            _id: "$_id.product_id",
            regions: {
                $push: {
                    region: "$_id.customer_region",
                    quantity: "$total_quantity",
                    revenue: "$total_revenue"
                }
            },
            total_quantity: { $sum: "$total_quantity" },
            total_revenue: { $sum: "$total_revenue" }
        }
    },
    {
        $sort: { total_revenue: -1 }
    }
])

Indexing Strategies

PostgreSQL Indexes

B-tree Indexes (Default):

-- Single column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index (order matters!)
CREATE INDEX idx_posts_author_published
ON posts(author_id, published_at);

-- Query can use index:
-- SELECT * FROM posts WHERE author_id = 123 ORDER BY published_at;
-- SELECT * FROM posts WHERE author_id = 123 AND published_at > '2025-01-01';

-- Query CANNOT fully use index:
-- SELECT * FROM posts WHERE published_at > '2025-01-01'; (only uses first column)

Partial Indexes:

-- Index only active users
CREATE INDEX idx_active_users
ON users(username)
WHERE active = true;

-- Index only recent orders
CREATE INDEX idx_recent_orders
ON orders(created_at, status)
WHERE created_at > '2024-01-01';

-- Benefits: Smaller index size, faster queries on filtered data

Expression Indexes:

-- Index on lowercase email for case-insensitive search
CREATE INDEX idx_users_email_lower
ON users(LOWER(email));

-- Query that uses this index:
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Index on JSONB field extraction
CREATE INDEX idx_metadata_tags
ON products((metadata->>'category'));

Full-Text Search Indexes:

-- Add tsvector column for full-text search
ALTER TABLE articles
ADD COLUMN tsv_content tsvector;

-- Populate tsvector column
UPDATE articles
SET tsv_content = to_tsvector('english', title || ' ' || content);

-- Create GIN index for full-text search
CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv_content);

-- Full-text search query
SELECT title, ts_rank(tsv_content, query) as rank
FROM articles, to_tsquery('english', 'database & design') query
WHERE tsv_content @@ query
ORDER BY rank DESC;

-- Trigger to auto-update tsvector
CREATE TRIGGER articles_tsv_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(tsv_content, 'pg_catalog.english', title, content);

JSONB Indexes:

-- GIN index for JSONB containment queries
CREATE INDEX idx_products_metadata
ON products USING GIN(metadata);

-- Queries that use this index:
SELECT * FROM products WHERE metadata @> '{"color": "blue"}';
SELECT * FROM products WHERE metadata ? 'size';

-- Index on specific JSONB path
CREATE INDEX idx_products_category
ON products((metadata->>'category'));

Index Monitoring:

-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

-- Check index usage
SELECT
    relname as table_name,
    indexrelname as index_name,
    idx_scan as times_used,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

MongoDB Indexes

Single Field Indexes:

// Create index on single field
db.users.createIndex({ email: 1 }) // 1 = ascending, -1 = descending

// Unique index
db.users.createIndex({ username: 1 }, { unique: true })

// Sparse index (only index documents with the field)
db.users.createIndex({ phone_number: 1 }, { sparse: true })

Compound Indexes:

// Index on multiple fields (order matters!)
db.posts.createIndex({ author_id: 1, published_at: -1 })

// Efficient queries:
// - { author_id: "123" }
// - { author_id: "123", published_at: { $gte: ... } }
// - { author_id: "123" } with sort by published_at

// Inefficient:
// - { published_at: { $gte: ... } } alone (doesn't use index efficiently)

// ESR Rule: Equality, Sort, Range
// Best compound index order:
// 1. Equality filters first
// 2. Sort fields second
// 3. Range filters last
db.orders.createIndex({
    status: 1,           // Equality
    created_at: -1,      // Sort
    total_amount: 1      // Range
})

Multikey Indexes (Array Fields):

// Index on array field
db.posts.createIndex({ tags: 1 })

// Document with array
{
    _id: ObjectId("..."),
    title: "Database Design",
    tags: ["database", "mongodb", "schema"]
}

// Query that uses multikey index
db.posts.find({ tags: "mongodb" })
db.posts.find({ tags: { $in: ["database", "nosql"] } })

// Compound multikey index (max one array field)
db.posts.createIndex({ tags: 1, published_at: -1 }) // Valid
// db.posts.createIndex({ tags: 1, categories: 1 }) // Invalid if both are arrays

Text Indexes:

// Create text index for full-text search
db.articles.createIndex({
    title: "text",
    content: "text"
})

// Text search query
db.articles.find({
    $text: { $search: "database design patterns" }
})

// Search with relevance score
db.articles.find(
    { $text: { $search: "database design" } },
    { score: { $meta: "textScore" } }
).sort({ score: { $meta: "textScore" } })

// Weighted text index (prioritize title over content)
db.articles.createIndex(
    { title: "text", content: "text" },
    { weights: { title: 10, content: 5 } }
)

Geospatial Indexes:

// 2dsphere index for geographic queries
db.locations.createIndex({ coordinates: "2dsphere" })

// Document format
{
    name: "Coffee Shop",
    coordinates: {
        type: "Point",
        coordinates: [-122.4194, 37.7749] // [longitude, latitude]
    }
}

// Find locations near a point
db.locations.find({
    coordinates: {
        $near: {
            $geometry: {
                type: "Point",
                coordinates: [-122.4194, 37.7749]
            },
            $maxDistance: 1000 // meters
        }
    }
})

Index Properties:

// TTL Index (auto-delete documents after time)
db.sessions.createIndex(
    { created_at: 1 },
    { expireAfterSeconds: 3600 } // 1 hour
)

// Partial Index (index subset of documents)
db.orders.createIndex(
    { status: 1, created_at: -1 },
    { partialFilterExpression: { status: { $eq: "pending" } } }
)

// Case-insensitive index
db.users.createIndex(
    { email: 1 },
    { collation: { locale: "en", strength: 2 } }
)

// Background index creation (doesn't block operations)
db.large_collection.createIndex(
    { field: 1 },
    { background: true }
)

Index Analysis:

// Explain query execution
db.posts.find({ author_id: "123" }).explain("executionStats")

// Check index usage
db.posts.aggregate([
    { $indexStats: {} }
])

// List all indexes on collection
db.posts.getIndexes()

// Drop unused index
db.posts.dropIndex("index_name")

Transactions

PostgreSQL Transaction Management

Basic Transactions:

-- Explicit transaction
BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;
-- or ROLLBACK; to cancel changes

Savepoints (Partial Rollback):

BEGIN;

UPDATE inventory SET quantity = quantity - 10 WHERE product_id = 'PROD-123';

SAVEPOINT before_audit;

INSERT INTO audit_log (action, details) VALUES ('inventory_update', '...');
-- Oops, error in audit log

ROLLBACK TO SAVEPOINT before_audit;
-- Inventory update preserved, audit insert rolled back

-- Fix and retry
INSERT INTO audit_log (action, details) VALUES ('inventory_update', 'correct details');

COMMIT;

Isolation Levels:

-- Read Uncommitted (not supported in PostgreSQL, defaults to Read Committed)
-- Read Committed (default) - sees only committed data
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

-- Repeatable Read - sees snapshot at transaction start
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM accounts WHERE id = 1; -- Returns balance 1000
-- Another transaction updates balance to 1500 and commits
SELECT * FROM accounts WHERE id = 1; -- Still returns 1000 (repeatable read)
COMMIT;

-- Serializable - strictest isolation, prevents all anomalies
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- If concurrent transactions would violate serializability, one aborts
COMMIT;

Advisory Locks (Application-Level Locking):

-- Exclusive lock on arbitrary number
SELECT pg_advisory_lock(12345);
-- ... perform critical operation ...
SELECT pg_advisory_unlock(12345);

-- Try lock (non-blocking)
SELECT pg_try_advisory_lock(12345); -- Returns true if acquired, false otherwise

-- Session-level advisory lock (auto-released on disconnect)
SELECT pg_advisory_lock(user_id);

Row-Level Locking:

-- SELECT FOR UPDATE - lock rows for update
BEGIN;

SELECT * FROM products
WHERE id = 123
FOR UPDATE; -- Locks this row

UPDATE products SET quantity = quantity - 1 WHERE id = 123;

COMMIT;

-- SELECT FOR SHARE - shared lock (allows other reads, blocks writes)
SELECT * FROM products WHERE id = 123 FOR SHARE;

-- SKIP LOCKED - skip locked rows instead of waiting
SELECT * FROM queue
WHERE processed = false
ORDER BY priority
LIMIT 10
FOR UPDATE SKIP LOCKED;

MongoDB Transactions

Multi-Document Transactions:

// Transactions require replica set or sharded cluster
const session = db.getMongo().startSession()
session.startTransaction()

try {
    const accountsCol = session.getDatabase("mydb").accounts

    // Debit account
    accountsCol.updateOne(
        { _id: "account1" },
        { $inc: { balance: -100 } },
        { session }
    )

    // Credit account
    accountsCol.updateOne(
        { _id: "account2" },
        { $inc: { balance: 100 } },
        { session }
    )

    // Commit transaction
    session.commitTransaction()
} catch (error) {
    // Abort on error
    session.abortTransaction()
    throw error
} finally {
    session.endSession()
}

Read and Write Concerns:

// Write Concern: Acknowledgment level
db.orders.insertOne(
    { customer_id: "123", items: [...] },
    {
        writeConcern: {
            w: "majority",        // Wait for majority of replica set
            j: true,              // Wait for journal write
            wtimeout: 5000        // Timeout after 5 seconds
        }
    }
)

// Read Concern: Data consistency level
db.orders.find(
    { status: "pending" }
).readConcern("majority") // Only return data acknowledged by majority

// Read Preference: Which replica to read from
db.orders.find({ ... }).readPref("secondary") // Read from secondary replica

Atomic Operations (Single Document):

// Single document updates are atomic by default
db.counters.updateOne(
    { _id: "page_views" },
    {
        $inc: { count: 1 },
        $set: { last_updated: new Date() }
    }
)

// Atomic array operations
db.posts.updateOne(
    { _id: ObjectId("...") },
    {
        $push: {
            comments: {
                $each: [{ author: "John", text: "Great!" }],
                $position: 0 // Insert at beginning
            }
        }
    }
)

// Find and modify (atomic read-modify-write)
db.queue.findOneAndUpdate(
    { status: "pending" },
    { $set: { status: "processing", processor_id: "worker-1" } },
    {
        sort: { priority: -1 },
        returnDocument: "after" // Return updated document
    }
)

Replication

PostgreSQL Replication

Streaming Replication (Primary-Standby):

-- Primary server configuration (postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = '1GB'
hot_standby = on

-- Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

-- pg_hba.conf on primary
host replication replicator standby_ip/32 md5

-- Standby server (recovery.conf or postgresql.auto.conf)
primary_conninfo = 'host=primary_ip port=5432 user=replicator password=...'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'

Logical Replication (Selective Replication):

-- On publisher (source)
CREATE PUBLICATION my_publication FOR TABLE users, posts;
-- or FOR ALL TABLES;

-- On subscriber (destination)
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=publisher_ip dbname=mydb user=replicator password=...'
PUBLICATION my_publication;

-- Monitor replication
SELECT * FROM pg_stat_replication;
SELECT * FROM pg_replication_slots;

Failover and Promotion:

-- Promote standby to primary
pg_ctl promote -D /var/lib/postgresql/data

-- Check replication lag
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    sync_state,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes
FROM pg_stat_replication;

MongoDB Replication

Replica Set Configuration:

// Initialize replica set
rs.initiate({
    _id: "myReplicaSet",
    members: [
        { _id: 0, host: "mongodb1.example.com:27017", priority: 2 },
        { _id: 1, host: "mongodb2.example.com:27017", priority: 1 },
        { _id: 2, host: "mongodb3.example.com:27017", priority: 1 }
    ]
})

// Add member to existing replica set
rs.add("mongodb4.example.com:27017")

// Remove member
rs.remove("mongodb4.example.com:27017")

// Check replica set status
rs.status()

// Check replication lag
rs.printSecondaryReplicationInfo()

Replica Set Roles:

// Priority 0 member (cannot become primary)
rs.add({
    host: "analytics.example.com:27017",
    priority: 0,
    hidden: true // Hidden from application drivers
})

// Arbiter (voting only, no data)
rs.addArb("arbiter.example.com:27017")

// Delayed member (disaster recovery)
rs.add({
    host: "delayed.example.com:27017",
    priority: 0,
    hidden: true,
    slaveDelay: 3600 // 1 hour behind
})

Read Preference Configuration:

// Application connection with read preference
const client = new MongoClient(uri, {
    readPreference: "secondaryPreferred", // Try secondary, fallback to primary
    readConcernLevel: "majority"
})

// Read Preference Modes:
// - primary (default): Read from primary only
// - primaryPreferred: Primary if available, else secondary
// - secondary: Read from secondary only
// - secondaryPreferred: Secondary if available, else primary
// - nearest: Read from nearest member (lowest latency)

Sharding

MongoDB Sharding Architecture

Shard Key Selection:

// Good shard key characteristics:
// 1. High cardinality (many distinct values)
// 2. Even distribution
// 3. Query isolation (queries target specific shards)

// Example: User-based application
sh.shardCollection("mydb.users", { user_id: "hashed" })

// Hashed shard key: Even distribution, random data location
sh.shardCollection("mydb.events", { event_id: "hashed" })

// Range-based shard key: Ordered data, good for range queries
sh.shardCollection("mydb.logs", { timestamp: 1, server_id: 1 })

// Compound shard key
sh.shardCollection("mydb.orders", {
    customer_region: 1,  // Coarse grouping
    order_date: 1        // Fine grouping
})

Sharding Setup:

// 1. Start config servers (replica set)
mongod --configsvr --replSet configRS --port 27019

// 2. Initialize config server replica set
rs.initiate({
    _id: "configRS",
    configsvr: true,
    members: [
        { _id: 0, host: "cfg1.example.com:27019" },
        { _id: 1, host: "cfg2.example.com:27019" },
        { _id: 2, host: "cfg3.example.com:27019" }
    ]
})

// 3. Start shard servers (each is a replica set)
mongod --shardsvr --replSet shard1RS --port 27018

// 4. Start mongos (query router)
mongos --configdb configRS/cfg1.example.com:27019,cfg2.example.com:27019

// 5. Add shards to cluster
sh.addShard("shard1RS/shard1-a.example.com:27018")
sh.addShard("shard2RS/shard2-a.example.com:27018")

// 6. Enable sharding on database
sh.enableSharding("mydb")

// 7. Shard collections
sh.shardCollection("mydb.users", { user_id: "hashed" })

Query Targeting:

// Targeted query (includes shard key)
db.users.find({ user_id: "12345" })
// Routes to single shard

// Scatter-gather query (no shard key)
db.users.find({ email: "user@example.com" })
// Queries all shards, merges results

// Check query targeting
db.users.find({ user_id: "12345" }).explain()
// Look for "SINGLE_SHARD" vs "ALL_SHARDS"

Zone Sharding (Geographic Distribution):

// Define zones for geographic sharding
sh.addShardToZone("shard1", "US")
sh.addShardToZone("shard2", "EU")

// Define zone ranges
sh.updateZoneKeyRange(
    "mydb.users",
    { region: "US", user_id: MinKey },
    { region: "US", user_id: MaxKey },
    "US"
)

sh.updateZoneKeyRange(
    "mydb.users",
    { region: "EU", user_id: MinKey },
    { region: "EU", user_id: MaxKey },
    "EU"
)

// Shard collection with zone-aware key
sh.shardCollection("mydb.users", { region: 1, user_id: 1 })

PostgreSQL Horizontal Partitioning

Declarative Partitioning:

-- Range partitioning
CREATE TABLE logs (
    id BIGSERIAL,
    log_time TIMESTAMP NOT NULL,
    message TEXT,
    level VARCHAR(10)
) PARTITION BY RANGE (log_time);

-- Create partitions
CREATE TABLE logs_2025_01 PARTITION OF logs
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

CREATE TABLE logs_2025_02 PARTITION OF logs
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');

-- List partitioning
CREATE TABLE customers (
    id SERIAL,
    name VARCHAR(255),
    region VARCHAR(50)
) PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers
FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE customers_eu PARTITION OF customers
FOR VALUES IN ('UK', 'DE', 'FR', 'IT');

-- Hash partitioning
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    data JSONB
) PARTITION BY HASH (id);

CREATE TABLE events_0 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE events_1 PARTITION OF events
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... events_2 and events_3

Partition Pruning (Query Optimization):

-- Query automatically uses only relevant partition
SELECT * FROM logs
WHERE log_time BETWEEN '2025-01-15' AND '2025-01-20';
-- Only scans logs_2025_01 partition

-- Check query plan
EXPLAIN SELECT * FROM logs WHERE log_time > '2025-01-01';
-- Shows which partitions are scanned

Performance Tuning

Query Optimization Techniques

PostgreSQL Query Analysis:

-- Basic explain
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';

-- Analyze with actual execution statistics
EXPLAIN ANALYZE
SELECT u.username, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.active = true
GROUP BY u.id, u.username
ORDER BY post_count DESC
LIMIT 10;

-- Identify slow queries
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    max_exec_time
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Table statistics
ANALYZE users; -- Update query planner statistics

-- Vacuum and analyze
VACUUM ANALYZE posts; -- Reclaim space and update stats

Common Query Patterns:

-- Avoid SELECT * (retrieve only needed columns)
-- BAD
SELECT * FROM users WHERE id = 123;

-- GOOD
SELECT id, username, email FROM users WHERE id = 123;

-- Use EXISTS instead of IN for large subqueries
-- BAD
SELECT * FROM posts WHERE author_id IN (
    SELECT id FROM users WHERE active = true
);

-- GOOD
SELECT * FROM posts p WHERE EXISTS (
    SELECT 1 FROM users u
    WHERE u.id = p.author_id AND u.active = true
);

-- Use JOINs instead of multiple queries
-- BAD (N+1 query problem)
-- SELECT * FROM posts;
-- Then for each post: SELECT * FROM users WHERE id = post.author_id;

-- GOOD
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

-- Window functions instead of self-joins
-- Calculate running total
SELECT
    order_date,
    amount,
    SUM(amount) OVER (ORDER BY order_date) as running_total
FROM orders;

-- Rank within groups
SELECT
    category,
    product_name,
    sales,
    RANK() OVER (PARTITION BY category ORDER BY sales DESC) as rank_in_category
FROM products;

MongoDB Query Optimization:

// Use projection to limit returned fields
// BAD
db.users.find({ active: true })

// GOOD
db.users.find(
    { active: true },
    { username: 1, email: 1, _id: 0 }
)

// Use covered queries (index covers all fields)
db.users.createIndex({ username: 1, email: 1 })
db.users.find(
    { username: "john_doe" },
    { username: 1, email: 1, _id: 0 }
) // Entire query served from index

// Avoid negation operators
// BAD (cannot use index efficiently)
db.products.find({ status: { $ne: "discontinued" } })

// GOOD
db.products.find({ status: { $in: ["active", "pending", "sold"] } })

// Use $lookup sparingly (expensive operation)
// Consider embedding data instead if appropriate

// Aggregation optimization: Filter early
// BAD
db.orders.aggregate([
    { $lookup: { ... } },        // Expensive join
    { $match: { status: "completed" } } // Filter after join
])

// GOOD
db.orders.aggregate([
    { $match: { status: "completed" } }, // Filter first
    { $lookup: { ... } }                  // Join fewer documents
])

Connection Pooling

PostgreSQL Connection Pooling:

// Using node-postgres (pg) with pool
const { Pool } = require('pg')

const pool = new Pool({
    host: 'localhost',
    port: 5432,
    database: 'mydb',
    user: 'dbuser',
    password: 'secret',
    max: 20,              // Maximum pool size
    idleTimeoutMillis: 30000,
    connectionTimeoutMillis: 2000
})

// Execute query
const result = await pool.query('SELECT * FROM users WHERE id = $1', [123])

// Use PgBouncer for server-side pooling
// pgbouncer.ini
// [databases]
// mydb = host=localhost port=5432 dbname=mydb
//
// [pgbouncer]
// pool_mode = transaction
// max_client_conn = 1000
// default_pool_size = 25

MongoDB Connection Pooling:

// MongoClient automatically manages connection pool
const { MongoClient } = require('mongodb')

const client = new MongoClient(uri, {
    maxPoolSize: 50,           // Max connections
    minPoolSize: 10,           // Min connections
    maxIdleTimeMS: 30000,      // Close idle connections
    waitQueueTimeoutMS: 5000   // Wait for available connection
})

await client.connect()
const db = client.db('mydb')
// Connection automatically returned to pool after use

Best Practices

PostgreSQL Best Practices

Schema Design
- Normalize for data integrity, denormalize for performance
- Use appropriate data types (avoid TEXT for short strings)
- Define NOT NULL constraints where appropriate
- Use SERIAL or UUID for primary keys consistently
Indexing
- Index foreign keys for JOIN performance
- Create indexes on frequently filtered/sorted columns
- Use partial indexes for selective queries
- Monitor and remove unused indexes
- Keep composite index column count reasonable (typically ≤ 3-4)
Query Performance
- Use EXPLAIN ANALYZE to understand query plans
- Avoid SELECT * in application code
- Use prepared statements to prevent SQL injection
- Limit result sets with LIMIT
- Use connection pooling
Maintenance
- Run VACUUM regularly (or enable autovacuum)
- Update statistics with ANALYZE
- Monitor slow query log
- Set appropriate autovacuum thresholds
- Regular backup with pg_dump or WAL archiving
Security
- Use SSL/TLS for connections
- Implement row-level security for multi-tenant apps
- Grant minimum necessary privileges
- Use parameterized queries
- Regular security updates

MongoDB Best Practices

Schema Design
- Embed related data that is accessed together
- Reference data that is large or rarely accessed
- Use polymorphic pattern for varied schemas
- Limit document size to reasonable bounds (< 1-2 MB typically)
- Design for your query patterns
Indexing
- Index on fields used in queries and sorts
- Use compound indexes with ESR rule (Equality, Sort, Range)
- Create text indexes for full-text search
- Monitor index usage with $indexStats
- Avoid too many indexes (write performance impact)
Query Performance
- Use projection to limit returned fields
- Create covered queries when possible
- Filter early in aggregation pipelines
- Avoid $lookup when embedding is appropriate
- Use explain() to verify index usage
Scalability
- Choose appropriate shard key (high cardinality, even distribution)
- Use replica sets for high availability
- Configure appropriate read/write concerns
- Monitor chunk distribution in sharded clusters
- Use zones for geographic distribution
Operations
- Enable authentication and authorization
- Use TLS for client connections
- Regular backups (mongodump or filesystem snapshots)
- Monitor with MongoDB Atlas, Ops Manager, or custom tools
- Keep MongoDB version updated

Data Modeling Decision Framework

Choose PostgreSQL when:

Strong ACID guarantees required (financial transactions)
Complex relationships with many JOINs
Data structure is well-defined and stable
Need for advanced SQL features (window functions, CTEs, stored procedures)
Compliance requirements demand strict consistency

Choose MongoDB when:

Schema flexibility needed (rapid development, evolving requirements)
Horizontal scalability is priority (sharding required)
Document-oriented data (JSON/BSON native format)
Hierarchical or nested data structures
High write throughput with eventual consistency acceptable

Hybrid Approach:

Use both databases for different parts of application
PostgreSQL for transactional data (orders, payments)
MongoDB for catalog, logs, user sessions
Synchronize critical data between systems

Common Patterns and Anti-Patterns

PostgreSQL Anti-Patterns

❌ Storing JSON when relational fits better

-- BAD: Using JSONB for structured, queryable data
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    data JSONB -- { name, email, address: { street, city, state } }
);

-- GOOD: Proper normalization
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255)
);

CREATE TABLE addresses (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    street VARCHAR(255),
    city VARCHAR(100),
    state VARCHAR(50)
);

❌ Over-indexing

-- BAD: Index on every column "just in case"
CREATE INDEX idx1 ON users(username);
CREATE INDEX idx2 ON users(email);
CREATE INDEX idx3 ON users(created_at);
CREATE INDEX idx4 ON users(updated_at);
CREATE INDEX idx5 ON users(active);
-- Result: Slow writes, large database size

-- GOOD: Index based on actual query patterns
CREATE INDEX idx_users_email ON users(email); -- Login queries
CREATE INDEX idx_active_users_created ON users(created_at) WHERE active = true; -- Partial

❌ N+1 Query Problem

-- BAD: Multiple queries in loop
SELECT * FROM posts; -- Returns 100 posts
-- Then for each post:
SELECT * FROM users WHERE id = ?; -- 100 additional queries!

-- GOOD: Single query with JOIN
SELECT p.*, u.username, u.email
FROM posts p
JOIN users u ON p.author_id = u.id;

MongoDB Anti-Patterns

❌ Massive arrays in documents

// BAD: Unbounded array growth
{
    _id: ObjectId("..."),
    username: "popular_user",
    followers: [
        ObjectId("follower1"),
        ObjectId("follower2"),
        // ... 100,000+ follower IDs
        // Document exceeds 16MB limit!
    ]
}

// GOOD: Separate collection with references
// users collection
{ _id: ObjectId("..."), username: "popular_user" }

// followers collection
{ _id: ObjectId("..."), user_id: ObjectId("..."), follower_id: ObjectId("...") }
db.followers.createIndex({ user_id: 1, follower_id: 1 })

❌ Poor shard key selection

// BAD: Monotonically increasing shard key
sh.shardCollection("mydb.events", { _id: 1 })
// All writes go to same shard (highest _id range)

// BAD: Low cardinality shard key
sh.shardCollection("mydb.users", { country: 1 })
// Most users in few countries = uneven distribution

// GOOD: Hashed _id or compound key
sh.shardCollection("mydb.events", { _id: "hashed" }) // Even distribution
sh.shardCollection("mydb.users", { country: 1, user_id: 1 }) // Compound

❌ Ignoring indexes on embedded documents

// Document structure
{
    username: "john_doe",
    profile: {
        email: "john@example.com",
        age: 30,
        city: "San Francisco"
    }
}

// Query on embedded field
db.users.find({ "profile.email": "john@example.com" })

// MISSING: Index on embedded field
db.users.createIndex({ "profile.email": 1 })

Troubleshooting Guide

PostgreSQL Issues

Slow Queries:

-- Enable slow query logging (postgresql.conf)
-- log_min_duration_statement = 1000  # Log queries > 1 second

-- Find slow queries
SELECT
    query,
    calls,
    total_exec_time / calls as avg_time_ms,
    rows / calls as avg_rows
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;

-- Analyze specific slow query
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT ... FROM ... WHERE ...;

High CPU Usage:

-- Check running queries
SELECT
    pid,
    now() - query_start as duration,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Terminate long-running query
SELECT pg_terminate_backend(pid);

Lock Contention:

-- View locks
SELECT
    locktype,
    relation::regclass,
    mode,
    granted,
    pid
FROM pg_locks
WHERE NOT granted;

-- Find blocking queries
SELECT
    blocked_locks.pid AS blocked_pid,
    blocking_locks.pid AS blocking_pid,
    blocked_activity.query AS blocked_query,
    blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted AND blocking_locks.granted;

MongoDB Issues