name

data-modeling-patterns

description

Database schema design and data modeling patterns including normalization principles (1NF-5NF), denormalization trade-offs, entity relationship design, indexing strategies, schema evolution, and domain-driven design patterns. Activates when designing new database schemas, refactoring data models, discussing normalization vs denormalization decisions, planning schema migrations, or modeling complex domain entities. Use when creating new tables/collections, redesigning existing schemas, evaluating relationship patterns, or making data integrity decisions.

Data Modeling Patterns

Overview

Data models are the foundation of every application. Get them right, and your application scales smoothly. Get them wrong, and you're stuck with expensive migrations and workarounds.

This skill provides systematic approaches to database schema design, from normalization principles to performance trade-offs, helping you make informed modeling decisions upfront.

When to use this skill:

Designing new database schemas or tables
Refactoring existing data models
Evaluating normalization vs denormalization trade-offs
Planning schema migrations and evolution
Modeling complex domain entities and relationships
Optimizing query performance through schema design

Quick Decision Framework

1. Start with Your Access Patterns

Most important question: How will you query this data?

Common access patterns → Schema design choices:
├─ Frequent joins across tables → Normalized (3NF)
├─ High read volume, rare writes → Denormalized
├─ Complex aggregations → Materialized views or denormalization
├─ Real-time queries → Indexed columns, partition keys
└─ Historical tracking → Event sourcing or audit tables

2. Choose Normalization Level

Normalization Level	When to Use	Trade-offs
1NF (Atomic values)	Always - baseline	None, always do this
2NF (No partial dependencies)	Most cases	Minimal overhead
3NF (No transitive dependencies)	Default for OLTP	Standard approach, good balance
BCNF (Strict 3NF)	Data integrity critical	Slightly more complex
4NF/5NF (Multi-valued dependencies)	Rare - only when many-to-many relationships complex	Query complexity increases
Denormalized	High read volume, read performance critical	Write complexity, data redundancy

Rule of thumb: Start with 3NF, denormalize only with evidence of performance issues.

Core Modeling Patterns

Pattern 1: Normalized Schema (3NF)

When to use:

Transactional systems (OLTP)
Data integrity is critical
Frequent updates to data
Storage efficiency matters
Standard relational databases (PostgreSQL, MySQL)

Example: E-commerce Order System

-- 3NF: Separate tables for each entity, no redundancy

-- Users table (1 entity = 1 table)
CREATE TABLE users (
    user_id         UUID PRIMARY KEY,
    email           VARCHAR(255) UNIQUE NOT NULL,
    username        VARCHAR(100) UNIQUE NOT NULL,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Products table (separate entity)
CREATE TABLE products (
    product_id      UUID PRIMARY KEY,
    name            VARCHAR(255) NOT NULL,
    description     TEXT,
    base_price      DECIMAL(10,2) NOT NULL,
    category_id     UUID REFERENCES categories(category_id),
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Orders table (no redundant data from users/products)
CREATE TABLE orders (
    order_id        UUID PRIMARY KEY,
    user_id         UUID REFERENCES users(user_id) NOT NULL,
    status          VARCHAR(50) NOT NULL,
    total_amount    DECIMAL(10,2) NOT NULL,  -- Computed from order_items
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Order items (junction table for many-to-many)
CREATE TABLE order_items (
    order_item_id   UUID PRIMARY KEY,
    order_id        UUID REFERENCES orders(order_id) NOT NULL,
    product_id      UUID REFERENCES products(product_id) NOT NULL,
    quantity        INTEGER NOT NULL CHECK (quantity > 0),
    unit_price      DECIMAL(10,2) NOT NULL,  -- Snapshot price at order time
    UNIQUE (order_id, product_id)  -- Prevent duplicate items
);

Benefits:

✅ No data duplication (single source of truth)
✅ Easy updates (change product price in one place)
✅ Data integrity enforced by foreign keys
✅ Storage efficient

Trade-offs:

❌ Requires joins to get complete data
❌ More complex queries
❌ Slower for read-heavy workloads

Normalization checklist:

1NF: All columns contain atomic values (no arrays, no JSON)
2NF: No partial dependencies (every non-key column depends on entire primary key)
3NF: No transitive dependencies (non-key columns don't depend on other non-key columns)

Pattern 2: Denormalized Schema (Read-Optimized)

When to use:

Read-heavy workloads (10:1 or higher read:write ratio)
Query performance critical (sub-100ms response time)
Acceptable data redundancy
Analytics/reporting systems (OLAP)
NoSQL databases (MongoDB, DynamoDB)

Example: Product Catalog (Denormalized)

-- Denormalized: Embed related data to avoid joins

CREATE TABLE product_catalog (
    product_id          UUID PRIMARY KEY,
    name                VARCHAR(255) NOT NULL,
    description         TEXT,
    base_price          DECIMAL(10,2) NOT NULL,

    -- Denormalized category data (duplicated across products)
    category_id         UUID,
    category_name       VARCHAR(100),  -- Duplicated!
    category_path       VARCHAR(500),  -- e.g., "Electronics > Computers > Laptops"

    -- Denormalized inventory data
    stock_quantity      INTEGER,  -- Duplicated from inventory table
    warehouse_location  VARCHAR(100),  -- Duplicated!

    -- Denormalized aggregate metrics
    total_sales         INTEGER DEFAULT 0,  -- Computed, updated periodically
    average_rating      DECIMAL(3,2),  -- Computed from reviews
    review_count        INTEGER DEFAULT 0,  -- Computed

    created_at          TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at          TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Indexes for common queries
CREATE INDEX idx_category ON product_catalog(category_id);
CREATE INDEX idx_price ON product_catalog(base_price);
CREATE INDEX idx_rating ON product_catalog(average_rating);

Benefits:

✅ Fast queries (no joins needed)
✅ Simple SQL (SELECT * FROM product_catalog WHERE...)
✅ Scales for reads (can cache entire rows)

Trade-offs:

❌ Data duplication (category_name stored in every product)
❌ Update complexity (change category_name → update all products)
❌ Data staleness risk (aggregates may be out of sync)
❌ Storage overhead

When to denormalize:

Read:write ratio > 10:1
Query performance requirements < 100ms
Data changes infrequently (e.g., category names)
Can tolerate eventual consistency

Pattern 3: Hybrid (Normalized Core + Denormalized Views)

When to use:

Need both data integrity and query performance
Can use materialized views or caching layer
Want best of both worlds

Example: Orders with Materialized View

-- Normalized core tables (source of truth)
CREATE TABLE orders (...);  -- As in Pattern 1
CREATE TABLE order_items (...);
CREATE TABLE products (...);
CREATE TABLE users (...);

-- Materialized view for read performance
CREATE MATERIALIZED VIEW order_details_mv AS
SELECT
    o.order_id,
    o.created_at,
    o.status,
    o.total_amount,
    u.user_id,
    u.email,
    u.username,
    json_agg(
        json_build_object(
            'product_id', p.product_id,
            'product_name', p.name,
            'quantity', oi.quantity,
            'unit_price', oi.unit_price
        )
    ) AS items
FROM orders o
JOIN users u ON o.user_id = u.user_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY o.order_id, u.user_id;

-- Refresh strategy
CREATE INDEX idx_order_created ON order_details_mv(created_at);
REFRESH MATERIALIZED VIEW CONCURRENTLY order_details_mv;  -- Periodic refresh

Benefits:

✅ Write to normalized tables (data integrity)
✅ Read from materialized view (fast queries)
✅ Best of both worlds

Trade-offs:

❌ View refresh overhead (minutes to hours)
❌ Stale data between refreshes
❌ More complexity (manage refresh schedule)

Refresh strategies:

Incremental: Refresh only changed rows (CONCURRENTLY)
Scheduled: Nightly refresh for reports
On-demand: Trigger refresh after writes
Near real-time: Use triggers or change data capture (CDC)

Relationship Patterns

One-to-Many (Most Common)

Example: User → Orders (one user, many orders)

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    ...
);

CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    user_id UUID REFERENCES users(user_id) NOT NULL,  -- Foreign key
    ...
);

-- Query: Get user's orders
SELECT * FROM orders WHERE user_id = ?;

-- Index on foreign key for performance
CREATE INDEX idx_orders_user ON orders(user_id);

Guidelines:

Always index the foreign key (user_id)
Consider cascade delete: ON DELETE CASCADE (delete user → delete orders)
Or restrict: ON DELETE RESTRICT (can't delete user with orders)

Many-to-Many

Example: Products ↔ Tags (products have tags, tags have products)

-- Entities
CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    ...
);

CREATE TABLE tags (
    tag_id UUID PRIMARY KEY,
    name VARCHAR(100) UNIQUE NOT NULL
);

-- Junction table (associate products with tags)
CREATE TABLE product_tags (
    product_id UUID REFERENCES products(product_id) ON DELETE CASCADE,
    tag_id UUID REFERENCES tags(tag_id) ON DELETE CASCADE,
    PRIMARY KEY (product_id, tag_id)  -- Composite primary key
);

-- Query: Get products with tag "electronics"
SELECT p.*
FROM products p
JOIN product_tags pt ON p.product_id = pt.product_id
JOIN tags t ON pt.tag_id = t.tag_id
WHERE t.name = 'electronics';

-- Indexes
CREATE INDEX idx_product_tags_product ON product_tags(product_id);
CREATE INDEX idx_product_tags_tag ON product_tags(tag_id);

Guidelines:

Use composite primary key (prevents duplicates)
Index both foreign keys
Consider adding metadata to junction table (e.g., created_at, priority)

One-to-One (Rare)

Example: User → UserProfile (one user, one profile)

Pattern 1: Separate table (for optional/large data)

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    ...
);

CREATE TABLE user_profiles (
    user_id UUID PRIMARY KEY REFERENCES users(user_id) ON DELETE CASCADE,
    bio TEXT,
    avatar_url VARCHAR(500),
    ...  -- Large/optional fields
);

When to use separate table:

Profile fields optional (not all users have profiles)
Profile data is large (TEXT columns, JSON blobs)
Different access patterns (rarely need profile data)

Pattern 2: Single table (for mandatory/small data)

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    bio TEXT,  -- Just include in users table
    avatar_url VARCHAR(500),
    ...
);

When to use single table:

Profile always exists for every user
Profile fields small/fixed-size
Always queried together

Self-Referencing (Hierarchies)

Example: Employee → Manager (employees manage other employees)

CREATE TABLE employees (
    employee_id UUID PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    manager_id UUID REFERENCES employees(employee_id),  -- Self-reference
    ...
);

-- Query: Get employee's manager
SELECT e.*, m.name AS manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id
WHERE e.employee_id = ?;

-- Query: Get all reports (recursive CTE)
WITH RECURSIVE reports AS (
    -- Base case: direct reports
    SELECT employee_id, name, manager_id, 1 AS level
    FROM employees
    WHERE manager_id = ?  -- Manager's ID

    UNION ALL

    -- Recursive case: reports of reports
    SELECT e.employee_id, e.name, e.manager_id, r.level + 1
    FROM employees e
    JOIN reports r ON e.manager_id = r.employee_id
)
SELECT * FROM reports ORDER BY level, name;

Alternative: Adjacency List + Path (Optimized Reads)

CREATE TABLE employees (
    employee_id UUID PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    manager_id UUID REFERENCES employees(employee_id),
    manager_path UUID[],  -- [CEO_id, VP_id, Director_id, Manager_id]
    ...
);

-- Query: Get all reports (simple array query)
SELECT * FROM employees WHERE ? = ANY(manager_path);

-- Trade-off: Faster reads, more complex writes (update path on reparenting)

Indexing Strategy

When to Add Indexes

Always index:

Primary keys (automatic)
Foreign keys (manually add)
Columns in WHERE clauses (frequent filters)
Columns in JOIN conditions
Columns in ORDER BY (sorting)

Example: Order queries

-- Frequent queries → Need indexes

-- Query 1: Get user's orders
SELECT * FROM orders WHERE user_id = ?;
-- Index: CREATE INDEX idx_orders_user ON orders(user_id);

-- Query 2: Recent orders
SELECT * FROM orders WHERE created_at > ? ORDER BY created_at DESC;
-- Index: CREATE INDEX idx_orders_created ON orders(created_at DESC);

-- Query 3: Orders by status
SELECT * FROM orders WHERE status = 'pending';
-- Index: CREATE INDEX idx_orders_status ON orders(status);

-- Query 4: User's recent orders (composite)
SELECT * FROM orders WHERE user_id = ? AND created_at > ?;
-- Index: CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

Index Types

B-Tree (Default): Most common, good for ranges, sorting

CREATE INDEX idx_price ON products(base_price);  -- Range queries (price > 10)

Hash: Equality only, faster for exact matches

CREATE INDEX idx_email ON users USING HASH(email);  -- WHERE email = ?

GIN (Generalized Inverted Index): Full-text search, JSONB, arrays

CREATE INDEX idx_tags ON products USING GIN(tags);  -- Array contains queries
CREATE INDEX idx_metadata ON products USING GIN(metadata);  -- JSONB queries

Partial Index: Index subset of rows (save space)

-- Only index active users
CREATE INDEX idx_active_users ON users(user_id) WHERE status = 'active';

Composite Index: Multiple columns (order matters!)

-- Good for: WHERE user_id = ? AND created_at > ?
CREATE INDEX idx_user_created ON orders(user_id, created_at);

-- NOT good for: WHERE created_at > ? (doesn't use user_id part)

Index Trade-offs

Benefits:

✅ Faster queries (10-1000x speedup)
✅ Efficient sorting and filtering

Costs:

❌ Slower writes (update index on INSERT/UPDATE/DELETE)
❌ Storage overhead (indexes take disk space)
❌ Maintenance overhead (vacuum, reindex)

Rule of thumb:

Add index if query slow (>100ms) AND runs frequently (>100/day)
Remove unused indexes (monitor with pg_stat_user_indexes)

Schema Evolution Patterns

Pattern 1: Backward-Compatible Changes (Safe)

Add nullable column:

-- Safe: Existing rows get NULL, no migration needed
ALTER TABLE products ADD COLUMN tags TEXT[];

Add table:

-- Safe: New table doesn't affect existing queries
CREATE TABLE product_images (...);

Add index:

-- Safe: Improves performance, no data changes
CREATE INDEX idx_products_category ON products(category_id);

Pattern 2: Backward-Incompatible Changes (Risky)

Remove column:

-- RISKY: Breaks code that reads this column
ALTER TABLE products DROP COLUMN old_field;

-- Migration strategy:
-- 1. Remove code that uses old_field (deploy)
-- 2. Wait 1 week (verify no errors)
-- 3. Drop column (deploy schema change)

Rename column:

-- RISKY: Breaks code that references old name
ALTER TABLE products RENAME COLUMN old_name TO new_name;

-- Migration strategy:
-- 1. Add new column with new_name
-- 2. Backfill data (new_name = old_name)
-- 3. Update code to use new_name (deploy)
-- 4. Remove old column after 1 week

Change column type:

-- RISKY: May lose data or fail validation
ALTER TABLE products ALTER COLUMN price TYPE DECIMAL(12,2);  -- Was DECIMAL(10,2)

-- Migration strategy:
-- 1. Create new column (price_new DECIMAL(12,2))
-- 2. Backfill: UPDATE products SET price_new = price
-- 3. Update code to use price_new (deploy)
-- 4. Drop old column, rename new (deploy)

Pattern 3: Online Migrations (Zero Downtime)

Strategy: Use triggers to keep old and new schemas in sync

-- Step 1: Add new column
ALTER TABLE products ADD COLUMN new_price DECIMAL(12,2);

-- Step 2: Backfill existing data (batched, don't lock table)
UPDATE products SET new_price = old_price WHERE new_price IS NULL LIMIT 1000;
-- Repeat until done

-- Step 3: Create trigger (keep in sync during transition)
CREATE TRIGGER sync_price
AFTER INSERT OR UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION sync_price_trigger();

-- Step 4: Deploy code that writes to both old_price and new_price

-- Step 5: Verify new_price matches old_price (no drift)

-- Step 6: Deploy code that reads from new_price only

-- Step 7: Remove trigger, drop old_price column

Domain-Driven Design Patterns

Entity Pattern

Definition: Object with unique identity (can change attributes, identity stays same)

Example: User

CREATE TABLE users (
    user_id UUID PRIMARY KEY,  -- Identity
    email VARCHAR(255) NOT NULL,  -- Attributes (can change)
    username VARCHAR(100) NOT NULL,
    ...
);

-- User identity (user_id) never changes
-- Attributes (email, username) can be updated

Value Object Pattern

Definition: Object defined by attributes (no identity, immutable)

Example: Address (embedded in user table)

CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email VARCHAR(255) NOT NULL,

    -- Address value object (no separate identity)
    address_line1 VARCHAR(255),
    address_line2 VARCHAR(255),
    address_city VARCHAR(100),
    address_state VARCHAR(50),
    address_zip VARCHAR(20),
    address_country VARCHAR(50),
    ...
);

-- Alternative: Use JSONB for value objects
CREATE TABLE users (
    user_id UUID PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    address JSONB,  -- {"line1": "...", "city": "...", ...}
    ...
);

When to embed vs separate table:

Embed (columns or JSONB): Small, always queried together, no relationships to other entities
Separate table: Large, optional, has relationships, frequently joined

Aggregate Pattern

Definition: Cluster of entities/value objects with consistency boundary

Example: Order aggregate (Order + OrderItems)

-- Aggregate root
CREATE TABLE orders (
    order_id UUID PRIMARY KEY,  -- Aggregate ID
    user_id UUID NOT NULL,
    status VARCHAR(50) NOT NULL,
    total_amount DECIMAL(10,2) NOT NULL,
    ...
);

-- Aggregate members (can't exist without order)
CREATE TABLE order_items (
    order_item_id UUID PRIMARY KEY,
    order_id UUID REFERENCES orders(order_id) ON DELETE CASCADE,  -- Cascade!
    product_id UUID NOT NULL,
    quantity INTEGER NOT NULL,
    ...
);

-- Consistency rule: Total amount = sum(order_items.quantity * unit_price)
-- Enforced by application logic, validated with triggers/constraints

Aggregate guidelines:

All changes go through aggregate root (Order)
Members can't be modified independently (OrderItems always via Order)
Cascade delete (delete Order → delete OrderItems)
Maintain invariants (total_amount consistency)

Anti-Patterns to Avoid

❌ EAV (Entity-Attribute-Value) Anti-Pattern

Problem: Generic key-value schema kills SQL performance

Bad example:

-- DON'T DO THIS
CREATE TABLE entity_attributes (
    entity_id UUID NOT NULL,
    attribute_name VARCHAR(100) NOT NULL,
    attribute_value TEXT,
    PRIMARY KEY (entity_id, attribute_name)
);

-- Nightmare queries:
SELECT
    MAX(CASE WHEN attribute_name = 'name' THEN attribute_value END) AS name,
    MAX(CASE WHEN attribute_name = 'price' THEN attribute_value END) AS price,
    ...
FROM entity_attributes
WHERE entity_id = ?;

Solution: Use proper columns or JSONB

-- Option 1: Proper columns
CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL(10,2),
    ...
);

-- Option 2: JSONB for truly dynamic attributes
CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL(10,2),
    metadata JSONB  -- Dynamic attributes
);

❌ God Table Anti-Pattern

Problem: Single table with 50+ columns, no clear focus

Bad example:

-- DON'T DO THIS
CREATE TABLE everything (
    id UUID PRIMARY KEY,
    -- User fields
    email VARCHAR(255),
    username VARCHAR(100),
    -- Address fields
    address_line1 VARCHAR(255),
    address_city VARCHAR(100),
    -- Order fields
    last_order_date TIMESTAMP,
    total_orders INTEGER,
    -- ... 40 more columns
);

Solution: Normalize into focused tables

CREATE TABLE users (...);  -- User-specific fields
CREATE TABLE addresses (...);  -- Address fields
CREATE TABLE order_summary (...);  -- Aggregated order data

❌ Premature Denormalization

Problem: Denormalize before measuring performance

When developers denormalize early:

"Joins are slow" (without evidence)
"We need fast reads" (without requirements)
"NoSQL is faster" (without benchmarks)

Solution:

Start normalized (3NF)
Measure actual query performance
Denormalize ONLY if:
- Query >100ms AND
- Runs >100x/day AND
- Tried indexes/caching first

❌ Missing Foreign Key Constraints

Problem: Orphaned records, data integrity issues

Bad example:

-- DON'T DO THIS (no foreign key)
CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    user_id UUID NOT NULL,  -- No REFERENCES constraint
    ...
);

-- Result: Orphaned orders (user deleted, orders remain)

Solution: Always use foreign keys

CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    user_id UUID REFERENCES users(user_id) ON DELETE RESTRICT,  -- Enforce referential integrity
    ...
);

Exception: Denormalized tables where data intentionally duplicated

Testing Data Models

Validation Checklist

Normalized to 3NF (unless performance justifies denormalization)
Foreign keys defined with appropriate ON DELETE behavior
Indexes on foreign keys and frequent WHERE/ORDER BY columns
Unique constraints on business keys (email, username, etc.)
NOT NULL on required fields
CHECK constraints for data validation (price > 0, quantity > 0)
Default values for created_at, updated_at timestamps
Primary keys are immutable (UUID or BIGSERIAL)

Performance Testing

-- Explain query performance
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = ?;

-- Check for sequential scans (bad for large tables)
-- Look for "Seq Scan" → Add index
-- Look for "Index Scan" → Good

-- Monitor unused indexes
SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;

Decision Checklist

When designing a new schema:

Identify entities and relationships
- What are the core entities? (User, Product, Order, etc.)
- What are the relationships? (One-to-many, many-to-many)
- What are the access patterns? (How will I query this?)
Choose normalization level
- Start with 3NF (default)
- Denormalize only with performance evidence
Define constraints
- Primary keys (UUID or BIGSERIAL)
- Foreign keys with ON DELETE behavior
- Unique constraints (business keys)
- NOT NULL (required fields)
- CHECK constraints (validation)
Plan indexing strategy
- Index all foreign keys
- Index WHERE clause columns
- Index ORDER BY columns
- Consider composite indexes for common query patterns
Design for evolution
- Use nullable columns for future expansion
- Version your schema (migration scripts)
- Plan backward-compatible changes

Summary

Key Principles

Start normalized (3NF), denormalize with evidence
Design for access patterns, not just entities
Index foreign keys and WHERE/ORDER BY columns
Use foreign key constraints for data integrity
Plan for schema evolution (backward compatibility)
Test query performance, optimize bottlenecks

Common Patterns

OLTP (transactional): Normalized (3NF)
OLAP (analytics): Denormalized or materialized views
Hybrid: Normalized core + denormalized views/caches

When to Denormalize

Read:write ratio > 10:1
Query performance requirements < 100ms
Data changes infrequently
Can tolerate eventual consistency

Remember: Data models are expensive to change. Invest time upfront to design them right.

Install Skill

SKILL.md

Data Modeling Patterns

Overview

Quick Decision Framework

1. Start with Your Access Patterns

2. Choose Normalization Level

Core Modeling Patterns

Pattern 1: Normalized Schema (3NF)

Pattern 2: Denormalized Schema (Read-Optimized)

Pattern 3: Hybrid (Normalized Core + Denormalized Views)

Relationship Patterns

One-to-Many (Most Common)

Many-to-Many

One-to-One (Rare)

Self-Referencing (Hierarchies)

Indexing Strategy

When to Add Indexes

Index Types

Index Trade-offs

Schema Evolution Patterns

Pattern 1: Backward-Compatible Changes (Safe)

Pattern 2: Backward-Incompatible Changes (Risky)

Pattern 3: Online Migrations (Zero Downtime)

Domain-Driven Design Patterns

Entity Pattern

Value Object Pattern

Aggregate Pattern

Anti-Patterns to Avoid

❌ EAV (Entity-Attribute-Value) Anti-Pattern

❌ God Table Anti-Pattern

❌ Premature Denormalization

❌ Missing Foreign Key Constraints

Testing Data Models

Validation Checklist

Performance Testing

Decision Checklist

Summary

Key Principles

Common Patterns

When to Denormalize