Schema Design
When to Use This Skill
Use this skill when:
- Schema Design tasks - Working on design database-specific schemas for relational, document, graph, and time-series databases
- Planning or design - Need guidance on Schema Design approaches
- Best practices - Want to follow established patterns and standards
Overview
Schema design translates logical data models into physical database structures optimized for specific platforms. Different database types require different design approaches.
Database Type Selection
Selection Matrix
| Requirement |
Relational |
Document |
Graph |
Time-Series |
Wide-Column |
| ACID transactions |
✓✓✓ |
✓ |
✓ |
✓ |
✓ |
| Complex queries |
✓✓✓ |
✓ |
✓✓ |
✓ |
✓ |
| Schema flexibility |
✓ |
✓✓✓ |
✓✓ |
✓ |
✓✓ |
| Relationship traversal |
✓ |
✓ |
✓✓✓ |
- |
- |
| Time-based analytics |
✓ |
✓ |
- |
✓✓✓ |
✓✓ |
| Horizontal scaling |
✓ |
✓✓✓ |
✓✓ |
✓✓ |
✓✓✓ |
Relational Database Design (SQL Server/PostgreSQL)
Table Design Template
-- SQL Server table with best practices (PascalCase naming - Microsoft convention)
CREATE TABLE dbo.Orders (
-- Primary Key (clustered) - GUID V7 for sortable IDs
OrderId UNIQUEIDENTIFIER NOT NULL DEFAULT NEWSEQUENTIALID(),
-- Foreign Keys
CustomerId UNIQUEIDENTIFIER NOT NULL,
-- Business Columns
OrderNumber VARCHAR(20) NOT NULL,
OrderDate DATE NOT NULL DEFAULT CAST(GETUTCDATE() AS DATE),
Status VARCHAR(20) NOT NULL DEFAULT 'Pending',
Subtotal DECIMAL(18,2) NOT NULL,
TaxAmount DECIMAL(18,2) NOT NULL DEFAULT 0,
TotalAmount AS (Subtotal + TaxAmount) PERSISTED,
-- Audit Columns
CreatedAt DATETIME2(3) NOT NULL DEFAULT SYSUTCDATETIME(),
CreatedBy VARCHAR(100) NOT NULL,
UpdatedAt DATETIME2(3),
UpdatedBy VARCHAR(100),
RowVersion ROWVERSION NOT NULL,
-- Constraints
CONSTRAINT PK_Orders PRIMARY KEY CLUSTERED (OrderId),
CONSTRAINT FK_Orders_Customer
FOREIGN KEY (CustomerId)
REFERENCES dbo.Customers(CustomerId),
CONSTRAINT UQ_Orders_OrderNumber UNIQUE (OrderNumber),
CONSTRAINT CK_Orders_Status
CHECK (Status IN ('Pending', 'Confirmed', 'Shipped', 'Delivered', 'Cancelled'))
);
-- Non-clustered indexes
CREATE NONCLUSTERED INDEX IX_Orders_CustomerDate
ON dbo.Orders (CustomerId, OrderDate DESC)
INCLUDE (Status, TotalAmount);
CREATE NONCLUSTERED INDEX IX_Orders_Status
ON dbo.Orders (Status)
WHERE Status NOT IN ('Delivered', 'Cancelled'); -- Filtered index
EF Core Configuration
// EF Core with SQL Server uses PascalCase by default - no column mapping needed!
public class OrderConfiguration : IEntityTypeConfiguration<Order>
{
public void Configure(EntityTypeBuilder<Order> builder)
{
builder.ToTable("Orders", "dbo");
// Primary Key - EF Core maps Id → OrderId automatically with conventions
builder.HasKey(o => o.Id);
builder.Property(o => o.Id)
.HasColumnName("OrderId") // Explicit for clarity
.HasDefaultValueSql("NEWSEQUENTIALID()");
// Properties - PascalCase C# → PascalCase SQL Server (natural mapping)
builder.Property(o => o.OrderNumber)
.HasMaxLength(20)
.IsRequired();
builder.Property(o => o.Subtotal)
.HasPrecision(18, 2);
builder.Property(o => o.TotalAmount)
.HasComputedColumnSql("[Subtotal] + [TaxAmount]", stored: true);
// Concurrency
builder.Property(o => o.RowVersion)
.IsRowVersion();
// Relationships
builder.HasOne(o => o.Customer)
.WithMany(c => c.Orders)
.HasForeignKey(o => o.CustomerId)
.OnDelete(DeleteBehavior.Restrict);
builder.HasMany(o => o.LineItems)
.WithOne()
.HasForeignKey(li => li.OrderId)
.OnDelete(DeleteBehavior.Cascade);
// Indexes
builder.HasIndex(o => o.OrderNumber)
.IsUnique();
builder.HasIndex(o => new { o.CustomerId, o.OrderDate })
.HasDatabaseName("IX_Orders_CustomerDate");
}
}
Document Database Design (MongoDB/Cosmos DB)
Document Model Strategies
| Pattern |
Description |
Use When |
| Embedded |
Nest related data |
1:few, always accessed together |
| Referenced |
Store IDs, join at app level |
1:many, independent access |
| Hybrid |
Denormalize frequently used |
Read optimization |
| Bucket |
Group by time/category |
Time-series, IoT |
MongoDB Schema Design
// Customer document with embedded addresses
{
"_id": ObjectId("..."),
"customer_id": "CUST-12345",
"profile": {
"name": "John Smith",
"email": "john@example.com",
"phone": "+1-555-0123"
},
"addresses": [
{
"type": "billing",
"street": "123 Main St",
"city": "Seattle",
"state": "WA",
"postal_code": "98101",
"country": "US",
"is_default": true
},
{
"type": "shipping",
"street": "456 Oak Ave",
"city": "Seattle",
"state": "WA",
"postal_code": "98102",
"country": "US",
"is_default": false
}
],
"preferences": {
"marketing_opt_in": true,
"preferred_channel": "email"
},
"metadata": {
"created_at": ISODate("2024-01-15T10:30:00Z"),
"updated_at": ISODate("2024-12-15T14:20:00Z"),
"version": 3
}
}
// Order document with referenced customer
{
"_id": ObjectId("..."),
"order_number": "ORD-2024-00123",
"customer_id": "CUST-12345", // Reference
"customer_snapshot": { // Denormalized for read
"name": "John Smith",
"email": "john@example.com"
},
"status": "confirmed",
"line_items": [ // Embedded
{
"product_id": "PROD-001",
"sku": "SKU-123",
"name": "Widget Pro",
"quantity": 2,
"unit_price": 29.99,
"extended_price": 59.98
}
],
"totals": {
"subtotal": 59.98,
"tax": 5.40,
"shipping": 10.00,
"total": 75.38
},
"dates": {
"ordered_at": ISODate("2024-12-20T09:00:00Z"),
"shipped_at": null,
"delivered_at": null
}
}
Cosmos DB Partition Strategy
// Partition key selection for multi-tenant orders
public class OrderDocument
{
[JsonProperty("id")]
public string Id { get; set; } = Guid.NewGuid().ToString();
[JsonProperty("pk")] // Partition key
public string PartitionKey => $"{TenantId}|{CustomerId}";
[JsonProperty("tenantId")]
public string TenantId { get; set; }
[JsonProperty("customerId")]
public string CustomerId { get; set; }
[JsonProperty("orderNumber")]
public string OrderNumber { get; set; }
[JsonProperty("type")] // For heterogeneous containers
public string DocumentType => "Order";
[JsonProperty("lineItems")]
public List<LineItemDocument> LineItems { get; set; } = [];
[JsonProperty("_ts")] // Auto-populated
public long Timestamp { get; set; }
}
Graph Database Design (Neo4j)
Node and Relationship Design
// Node types
(:Customer {
customerId: 'CUST-12345',
name: 'John Smith',
email: 'john@example.com',
createdAt: datetime()
})
(:Product {
productId: 'PROD-001',
name: 'Widget Pro',
category: 'Electronics',
price: 29.99
})
(:Order {
orderId: 'ORD-2024-00123',
orderDate: date(),
status: 'confirmed',
totalAmount: 75.38
})
// Relationships
(:Customer)-[:PLACED {orderedAt: datetime()}]->(:Order)
(:Order)-[:CONTAINS {quantity: 2, unitPrice: 29.99}]->(:Product)
(:Customer)-[:VIEWED {viewedAt: datetime()}]->(:Product)
(:Product)-[:SIMILAR_TO {score: 0.85}]->(:Product)
Graph Query Patterns
// Recommendation query: Customers who bought X also bought
MATCH (c:Customer)-[:PLACED]->(:Order)-[:CONTAINS]->(p:Product {productId: 'PROD-001'})
MATCH (c)-[:PLACED]->(:Order)-[:CONTAINS]->(other:Product)
WHERE other.productId <> 'PROD-001'
RETURN other.name, COUNT(*) as frequency
ORDER BY frequency DESC
LIMIT 5;
// Path finding: Supply chain
MATCH path = (supplier:Supplier)-[:SUPPLIES*1..5]->(product:Product)
WHERE product.productId = 'PROD-001'
RETURN path;
Time-Series Database Design (InfluxDB/TimescaleDB)
TimescaleDB Schema
-- Create hypertable for time-series data
CREATE TABLE sensor_readings (
time TIMESTAMPTZ NOT NULL,
device_id VARCHAR(50) NOT NULL,
location VARCHAR(100),
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION,
pressure DOUBLE PRECISION
);
-- Convert to hypertable (partitioned by time)
SELECT create_hypertable('sensor_readings', 'time');
-- Add compression policy
ALTER TABLE sensor_readings
SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'device_id'
);
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');
-- Create continuous aggregate for hourly stats
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS bucket,
device_id,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp,
MIN(temperature) AS min_temp,
AVG(humidity) AS avg_humidity
FROM sensor_readings
GROUP BY time_bucket('1 hour', time), device_id;
Indexing Strategies
Index Types by Database
| Database |
Index Types |
| SQL Server |
Clustered, Non-clustered, Columnstore, Filtered, Covering |
| PostgreSQL |
B-tree, Hash, GiST, GIN, BRIN |
| MongoDB |
Single field, Compound, Multi-key, Text, Geospatial |
| Cosmos DB |
Range, Spatial, Composite (included by default) |
Index Selection Guide
# Index Planning Checklist
## Query Analysis
- [ ] Identify most frequent queries
- [ ] Analyze query patterns (point lookup, range, full scan)
- [ ] Review execution plans for table scans
- [ ] Identify JOIN columns
## Index Candidates
| Column(s) | Query Pattern | Index Type | Include Columns |
|-----------|---------------|------------|-----------------|
| CustomerId | Equality lookup | Non-clustered | Status, CreatedAt |
| OrderDate | Range scan | Non-clustered | TotalAmount |
| Status | Equality (active only) | Filtered | - |
## Index Maintenance
- [ ] Define rebuild/reorganize schedule
- [ ] Monitor fragmentation
- [ ] Track index usage statistics
- [ ] Remove unused indexes
Data Type Selection
Type Mapping
| Logical Type |
SQL Server |
PostgreSQL |
MongoDB |
Cosmos DB |
| Identifier |
UNIQUEIDENTIFIER |
UUID |
ObjectId/String |
String |
| Money |
DECIMAL(18,2) |
NUMERIC(18,2) |
Decimal128 |
Number |
| Date only |
DATE |
DATE |
Date (midnight) |
String (ISO) |
| Timestamp |
DATETIME2(3) |
TIMESTAMPTZ |
Date |
String (ISO) |
| Boolean |
BIT |
BOOLEAN |
Boolean |
Boolean |
| JSON |
NVARCHAR(MAX) |
JSONB |
Object (native) |
Object (native) |
Performance Patterns
Denormalization Decisions
| Scenario |
Normalize |
Denormalize |
| High write frequency |
✓ |
|
| High read frequency |
|
✓ |
| Data consistency critical |
✓ |
|
| Query latency critical |
|
✓ |
| Storage constraints |
✓ |
|
| Join complexity high |
|
✓ |
Validation Checklist
Integration Points
Inputs from:
er-modeling skill → Logical model
dimensional-modeling skill → Star/snowflake schema
data-vault-modeling skill → Hub/link/satellite
Outputs to:
migration-planning skill → DDL scripts
- EF Core → Entity configurations
- Database deployment → Schema scripts