| name | minimal-modeling |
| description | Use when designing database schemas from business requirements, translating vague requirements into concrete SQL definitions, or validating database designs with non-technical stakeholders. Provides a systematic methodology that decouples logical business modeling from physical implementation to reduce cognitive load and ensure business-driven design. |
Minimal Modeling Database Design
Overview
Minimal Modeling is a database design methodology that bridges the gap between vague business requirements and concrete SQL schemas through systematic decoupling of logical and physical concerns.
Core Principle: Separate the "what exists" (business logic) from "how to store it" (technical implementation) to reduce cognitive load and enable validation by non-technical stakeholders before writing any SQL.
When to Use
Use this skill when:
- Translating business requirements ("I need a website for...") into database schemas
- Starting a new project with unclear data requirements
- Validating database design with non-technical stakeholders
- Experiencing analysis paralysis from too many implementation details early
- Need to explain database structure to business users
Do NOT use when:
- Database schema already exists and you're just querying it
- Working with predefined ORM models that can't be changed
- Technical constraints (like existing tables) must drive the design
The Two-Phase Process
Phase A: Logical Model (Business Domain)
Focus entirely on business concepts without technical implementation.
Phase B: Physical Schema (Implementation)
Translate business concepts into tables, columns, and data types.
Phase A: Logical Model
Step 1: Gather Requirements
Start with informal text or transcript describing the system.
Step 2: Find Anchors (The "Nouns")
Definition: Entities that exist independently with unique IDs. They represent things you count, not data itself.
Validation: Use two sentences to test if a noun is an Anchor:
Counting Sentence: "We have 1,000 [Nouns] in our database."
Adding Sentence: "The system inserts another [Noun] into the database."
If both sentences sound natural, it's an Anchor.
Examples:
- ✅ User, Post, Order, Product (you count these)
- ❌ Email, Title, Price (these are data about things)
Step 3: Find Attributes (The "Data")
Definition: Information that describes an Anchor.
Validation: Define each attribute with a Human-Readable Question:
"What is the [attribute name]?"
"Is this [condition]?"
Examples:
- User → "What is the email address?" (String)
- Product → "What is the price?" (Decimal)
- Room → "Is this wheelchair accessible?" (Boolean)
Common Attribute Types:
- Strings: Names, descriptions, addresses
- Numbers: Counts, IDs, quantities
- Money: Prices, balances (always use DECIMAL, never Float)
- Booleans: Flags, yes/no questions
- Dates/Times: Created dates, scheduled events
- Binary: Files, images (or S3 URLs)
Step 4: Find Links (The "Relationships")
Definition: Connections between two Anchors (or an Anchor to itself).
Validation: Use the Two-Sentence Rule to determine cardinality:
Sentence 1: "An [Anchor A] [verb] [one/several] [Anchor B]."
Sentence 2: "An [Anchor B] [verb] [one/several] [Anchor A]."
Cardinality Patterns:
| Sentence Pattern | Cardinality | Implementation |
|---|---|---|
| "A has several B" + "B belongs to one A" | 1:N (One-to-Many) | Foreign key in B table |
| "A has one B" + "B belongs to one A" | 1:1 (One-to-One) | Foreign key in either table |
| "A has several B" + "B has several A" | N:M (Many-to-Many) | Junction table |
Examples:
User ↔ Order
"A User can place several Orders."
"An Order is placed by one User."
Result: 1:N → Add user_id to orders table
User ↔ Post (likes)
"A User can like several Posts."
"A Post can be liked by several Users."
Result: N:M → Create users_liked_posts junction table
Step 5: Verify Completeness
Go back to original requirements. Highlight every word covered by your model.
Missing highlights indicate:
- Missed Anchors
- Missed Attributes
- Missed Links
Continue iterating until all business concepts are captured.
Phase B: Physical Schema
Step 6: Choose Table Names
Convention: Pluralize the Anchor name
- User →
users - Post →
posts - Order →
orders
Step 7: Choose Column Types
Critical Type Decisions:
| Logical Type | SQL Type | Notes |
|---|---|---|
| ID | INTEGER or BIGINT |
Use BIGINT for massive scale (billions of rows) |
| String | VARCHAR(N) |
Default to empty string, not NULL |
| Money | DECIMAL(M,D) |
NEVER use Float for money |
| Boolean | BOOLEAN or TINYINT |
Depends on database |
| Timestamp (past) | TIMESTAMP in UTC |
Store server events in UTC |
| Timestamp (future) | Store local time + timezone | For user-scheduled events |
| Binary | BLOB or S3 URL |
Consider external storage for large files |
NULL vs Default Values:
- Prefer meaningful defaults over NULL when possible
- Strings: Default to
''(empty string) - Numbers: Consider if
0makes sense or if NULL is semantic
Step 8: Implement Links
One-to-Many (1:N): Add foreign key column to the "Many" side table.
-- User has many Orders
CREATE TABLE orders (
id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL, -- Foreign key to users
total_amount DECIMAL(10,2),
FOREIGN KEY (user_id) REFERENCES users(id)
);
Many-to-Many (N:M): Create a junction table with foreign keys to both sides.
-- Users can like many Posts, Posts can be liked by many Users
CREATE TABLE users_liked_posts (
user_id INTEGER NOT NULL,
post_id INTEGER NOT NULL,
liked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (user_id, post_id),
FOREIGN KEY (user_id) REFERENCES users(id),
FOREIGN KEY (post_id) REFERENCES posts(id)
);
One-to-One (1:1): Add foreign key to either table (choose based on query patterns).
-- User has one Profile
CREATE TABLE profiles (
id INTEGER PRIMARY KEY,
user_id INTEGER UNIQUE NOT NULL, -- UNIQUE enforces 1:1
bio TEXT,
FOREIGN KEY (user_id) REFERENCES users(id)
);
Quick Reference: Validation Checklist
Anchors:
- Pass counting sentence test
- Pass adding sentence test
- Have unique IDs
- Exist independently
Attributes:
- Have human-readable question
- Belong to a specific Anchor
- Are single-valued (not lists)
- Appropriate SQL type chosen
Links:
- Two-sentence validation complete
- Cardinality determined (1:1, 1:N, N:M)
- Implementation strategy chosen
- Foreign keys properly constrained
Physical Schema:
- Table names pluralized
- Primary keys defined
- Foreign keys with constraints
- Column types match logical types
- Defaults specified where appropriate
- Indexes planned for common queries
Common Mistakes
Mistake 1: Mixing Logical and Physical Too Early
❌ BAD: "Should User have a VARCHAR(255) email?"
✅ GOOD: "Does User have an email address? [Yes] → Later: VARCHAR(255)"
Mistake 2: Attributes as Anchors
❌ BAD: EmailAddress as an Anchor
✅ GOOD: Email as an Attribute of User
Ask: "We have 1,000 emails in our database?" (sounds weird → not an Anchor)
Mistake 3: Using Float for Money
❌ BAD: price FLOAT
✅ GOOD: price DECIMAL(10,2)
Reason: Float has rounding errors; DECIMAL is exact
Mistake 4: Wrong Cardinality Side
❌ BAD: Adding order_id to users table (1:N wrong side)
✅ GOOD: Adding user_id to orders table (1:N correct side)
Rule: Foreign key goes on the "Many" side
Mistake 5: Missing Junction Tables
❌ BAD: Adding post_ids array column to users (N:M)
✅ GOOD: Create users_liked_posts junction table
Reason: Relational databases don't handle arrays well
Mistake 6: Storing Timezone as String
❌ BAD: "America/New_York" string for all times
✅ GOOD:
- Past events: UTC timestamp
- Future events: Local time + timezone
Reason: Different semantic needs
Secondary Data
Definition: Data that is duplicated, cached, or aggregated for performance. Not the source of truth.
Examples:
total_postscount on User (redundant with COUNT of posts)last_login_aton User (duplicated from sessions table)
When to Use:
- Performance optimization after proving it's needed
- Denormalization for read-heavy workloads
Always:
- Document as secondary data
- Have mechanism to rebuild from source of truth
- Consider if query optimization or indexing is better solution
Working with Query Builders and ORMs
When using query builders like Kysely or ORMs, the process remains the same:
- Logical Model first (Anchors, Attributes, Links)
- Define schema types instead of raw SQL
- Generate migrations from type definitions
// After logical modeling, translate to Kysely schema:
// Define database schema interface
interface Database {
users: UserTable; // Anchor: User
orders: OrderTable; // Anchor: Order
}
// Anchor: User
interface UserTable {
id: Generated<number>;
email: string; // Attribute
created_at: Generated<Timestamp>;
}
// Anchor: Order
interface OrderTable {
id: Generated<number>;
user_id: number; // Link (1:N) - Foreign key to users
total: string; // Decimal stored as string for precision
created_at: Generated<Timestamp>;
}
// Create Kysely instance
const db = new Kysely<Database>({
dialect: new PostgresDialect({ pool }),
});
// Queries respect the logical model:
const userOrders = await db
.selectFrom('orders')
.where('user_id', '=', userId) // Following 1:N relationship
.selectAll()
.execute();
Time and Timezone Best Practices
Past Events (already happened):
-- Server events, log entries, message timestamps
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -- Store in UTC
Future Events (scheduled by user):
-- Concert times, meeting schedules, alarm times
event_time TIMESTAMP,
event_timezone VARCHAR(50) -- e.g., "America/New_York"
Reason: Past events have definite UTC times. Future events need local context because timezone rules may change (DST, political changes).
Iterative Refinement
Database design is iterative. After creating initial schema:
- Test with sample data - Does it feel natural?
- Write common queries - Are they simple or complex?
- Show to stakeholders - Do they understand the model?
- Adjust and repeat - Refine based on feedback
Signs you need to revisit:
- Queries require 5+ JOINs regularly
- Stakeholders confused by table names
- Constant use of NULL checks in queries
- Performance issues with simple queries
- Adding "workaround" columns frequently
Real-World Impact
Benefits:
- Non-technical stakeholders can validate the logical model
- Catch misunderstandings before writing SQL
- Faster iterations (change model, not migrations)
- Clear documentation of business logic
- Reduced cognitive load during design
When you skip this:
- Premature optimization (wrong indexes too early)
- Schema doesn't match business needs
- Expensive refactoring later
- Technical debt from implementation-driven design