Database Design Skill
Purpose: Atomic skill for database architecture with comprehensive parameter validation and optimization patterns.
Skill Identity
| Attribute |
Value |
| Scope |
Schema Design, Indexing, Query Optimization |
| Responsibility |
Single: Data modeling and storage patterns |
| Invocation |
Skill("database-design") |
Parameter Schema
Input Validation
parameters:
database_context:
type: object
required: true
properties:
use_case:
type: string
enum: [oltp, olap, htap, time_series, document, graph]
required: true
data_characteristics:
type: object
required: true
properties:
volume:
type: string
pattern: "^\\d+[KMGTP]B$"
velocity:
type: string
pattern: "^\\d+[KM]?\\s*(reads?|writes?)/s$"
variety:
type: string
enum: [structured, semi_structured, unstructured]
access_patterns:
type: array
items:
type: object
properties:
query_type: { type: string }
frequency: { type: string, enum: [high, medium, low] }
latency_sla: { type: string, pattern: "^\\d+ms$" }
consistency:
type: string
enum: [strong, eventual, read_your_writes]
default: strong
validation_rules:
- name: "oltp_latency"
rule: "use_case == 'oltp' implies latency_sla <= '100ms'"
error: "OLTP workloads typically require <100ms latency"
- name: "volume_velocity_match"
rule: "high_volume implies adequate_velocity"
error: "Volume and velocity constraints may be incompatible"
Output Schema
output:
type: object
properties:
database_recommendation:
type: object
properties:
primary: { type: string }
secondary: { type: string }
rationale: { type: string }
schema:
type: object
properties:
tables: { type: array }
relationships: { type: array }
constraints: { type: array }
indexes:
type: array
items:
type: object
properties:
table: { type: string }
columns: { type: array }
type: { type: string }
rationale: { type: string }
optimization_tips:
type: array
items: { type: string }
Core Patterns
Database Selection Matrix
OLTP (Transactional):
├── PostgreSQL
│ ├── Best for: Complex queries, ACID, JSON support
│ ├── Scale: Vertical + read replicas
│ └── Limit: ~10TB effective, 10K TPS
├── MySQL/MariaDB
│ ├── Best for: Web applications, simple queries
│ ├── Scale: Read replicas, ProxySQL
│ └── Limit: Similar to PostgreSQL
└── CockroachDB/TiDB
├── Best for: Distributed ACID
├── Scale: Horizontal auto-sharding
└── Trade-off: Higher latency
OLAP (Analytical):
├── ClickHouse
│ ├── Best for: Real-time analytics, columnar
│ ├── Performance: 1B+ rows/second
│ └── Use: Log analysis, metrics
├── Snowflake/BigQuery
│ ├── Best for: Data warehouse, serverless
│ ├── Scale: Unlimited, pay-per-query
│ └── Use: BI, reporting
└── DuckDB
├── Best for: Embedded analytics
└── Use: Edge, single-machine analytics
NoSQL:
├── MongoDB
│ ├── Best for: Documents, flexible schema
│ └── Pattern: Embedding, denormalization
├── Cassandra
│ ├── Best for: Write-heavy, wide-column
│ └── Pattern: Partition key design
├── Redis
│ ├── Best for: Cache, sessions, pub/sub
│ └── Limit: Memory-bound
└── Neo4j
├── Best for: Graph relationships
└── Pattern: Cypher queries
Schema Design Patterns
Normalization (OLTP):
├── 1NF: Atomic values
├── 2NF: No partial dependencies
├── 3NF: No transitive dependencies
└── BCNF: Strict key dependencies
Denormalization (OLAP/NoSQL):
├── Embedding
│ ├── Nest related documents
│ ├── Best for: 1:few relationships
│ └── Avoid: Large arrays, frequent updates
├── Bucketing
│ ├── Group time-series data
│ ├── Reduce document count
│ └── Improve write performance
└── Materialized Views
├── Pre-computed aggregates
├── Trade: Storage vs query time
└── Refresh: Incremental or full
Indexing Strategies
Index Types:
├── B-tree (default)
│ ├── Range queries, sorting
│ ├── Overhead: ~10% storage
│ └── Example: WHERE date BETWEEN x AND y
├── Hash
│ ├── Equality only
│ ├── Faster lookup
│ └── Example: WHERE id = 123
├── GIN (PostgreSQL)
│ ├── Full-text, JSONB, arrays
│ ├── Higher write cost
│ └── Example: WHERE tags @> '{redis}'
├── BRIN (Block Range)
│ ├── Large sorted tables
│ ├── Minimal storage
│ └── Example: Time-series data
└── Composite
├── Multi-column index
├── Order matters (leftmost prefix)
└── Example: (user_id, created_at DESC)
Index Selection Rules:
├── Query frequency > 10% of total
├── Selectivity > 10% (filter out 90%)
├── Write overhead acceptable
└── Index size < table size
Retry Logic
Connection & Query Retry
retry_config:
connection:
max_attempts: 3
initial_delay_ms: 100
max_delay_ms: 5000
multiplier: 2.0
query:
max_attempts: 2
timeout_ms: 30000
retry_on:
- CONNECTION_LOST
- DEADLOCK
- LOCK_TIMEOUT
abort_on:
- SYNTAX_ERROR
- CONSTRAINT_VIOLATION
- PERMISSION_DENIED
transaction:
retry_on_deadlock: true
max_deadlock_retries: 3
isolation_level: READ_COMMITTED
Logging & Observability
Log Format
log_schema:
level: { type: string }
timestamp: { type: string, format: ISO8601 }
skill: { type: string, value: "database-design" }
event:
type: string
enum:
- schema_analyzed
- index_recommended
- query_optimized
- capacity_estimated
- migration_planned
context:
type: object
properties:
table: { type: string }
query: { type: string }
execution_time_ms: { type: number }
rows_affected: { type: integer }
example:
level: INFO
event: index_recommended
context:
table: orders
columns: [user_id, status, created_at]
type: composite
rationale: "Covers 80% of queries"
Metrics
metrics:
- name: query_duration_seconds
type: histogram
labels: [query_type, table]
buckets: [0.001, 0.01, 0.1, 1, 10]
- name: index_usage_ratio
type: gauge
labels: [index_name, table]
- name: table_size_bytes
type: gauge
labels: [table, database]
- name: connection_pool_usage
type: gauge
labels: [pool_name]
Troubleshooting
Common Issues
| Issue |
Cause |
Resolution |
| Slow queries |
Missing index |
Add appropriate index |
| Lock waits |
Long transactions |
Split or optimize |
| Connection exhaustion |
Pool too small |
Increase pool, add pooler |
| Replication lag |
Heavy writes |
Add replicas, optimize queries |
| Disk full |
Bloat, no vacuum |
VACUUM, archival |
Debug Checklist
□ EXPLAIN ANALYZE run?
□ Index usage verified?
□ Connection pool monitored?
□ Lock contention checked?
□ Table statistics current?
□ Slow query log enabled?
Unit Test Templates
Schema Validation Tests
# test_database_design.py
def test_valid_oltp_schema():
params = {
"database_context": {
"use_case": "oltp",
"data_characteristics": {
"volume": "100GB",
"velocity": "10K writes/s",
"variety": "structured"
},
"access_patterns": [
{"query_type": "point_lookup", "frequency": "high", "latency_sla": "10ms"}
]
}
}
result = validate_parameters(params)
assert result.valid == True
def test_oltp_latency_validation():
params = {
"database_context": {
"use_case": "oltp",
"access_patterns": [
{"latency_sla": "500ms"} # Too high for OLTP
]
}
}
result = validate_parameters(params)
assert "latency" in result.warnings[0]
def test_index_recommendation():
query = "SELECT * FROM orders WHERE user_id = ? AND status = ? ORDER BY created_at DESC"
access_pattern = {"frequency": "high", "latency_sla": "10ms"}
result = recommend_index(query, access_pattern)
assert result.columns == ["user_id", "status", "created_at"]
assert result.type == "composite"
def test_capacity_estimation():
result = estimate_capacity(
rows=10_000_000,
row_size_bytes=500,
growth_rate_monthly=0.05,
months=12
)
assert result.current_size == "5GB"
assert result.projected_size == "8.97GB" # 5GB * 1.05^12
Query Optimization Tests
def test_explain_analysis():
explain_output = """
Seq Scan on orders (rows=1000000)
Filter: (user_id = 123)
"""
result = analyze_explain(explain_output)
assert result.issues[0].type == "SEQ_SCAN"
assert result.suggestions[0] == "Add index on orders(user_id)"
def test_composite_index_order():
queries = [
"WHERE user_id = ? AND status = ?",
"WHERE user_id = ? ORDER BY created_at"
]
result = optimize_index_order(queries)
# user_id should be first (equality), created_at last (sort)
assert result.recommended_order == ["user_id", "status", "created_at"]
Version History
| Version |
Date |
Changes |
| 2.0.0 |
2025-01 |
Production-grade rewrite with optimization patterns |
| 1.0.0 |
2024-12 |
Initial release |