| name | database-sharding |
| description | Database sharding for PostgreSQL/MySQL with hash/range/directory strategies. Use for horizontal scaling, multi-tenant isolation, billions of records, or encountering wrong shard keys, hotspots, cross-shard transactions, rebalancing issues. |
| keywords | database sharding, horizontal partitioning, shard key, consistent hashing, hash sharding, range sharding, directory sharding, cross-shard queries, shard rebalancing, scatter-gather pattern, multi-tenant sharding, shard router, distributed database, database scalability, shard migration, hotspot shards, shard key selection, cross-shard aggregation, two-phase commit, saga pattern, virtual shards |
| license | MIT |
database-sharding
Comprehensive database sharding patterns for horizontal scaling with hash, range, and directory-based strategies.
Quick Start (10 Minutes)
Step 1: Choose sharding strategy from templates:
# Hash-based (even distribution)
cat templates/hash-router.ts
# Range-based (time-series data)
cat templates/range-router.ts
# Directory-based (multi-tenancy)
cat templates/directory-router.ts
Step 2: Select shard key criteria:
- ✅ High cardinality (millions of unique values)
- ✅ Even distribution (no single value > 5%)
- ✅ Immutable (never changes)
- ✅ Query alignment (in 80%+ of WHERE clauses)
Step 3: Implement router:
import { HashRouter } from './hash-router';
const router = new HashRouter([
{ id: 'shard_0', connection: { host: 'db0.example.com' } },
{ id: 'shard_1', connection: { host: 'db1.example.com' } },
{ id: 'shard_2', connection: { host: 'db2.example.com' } },
{ id: 'shard_3', connection: { host: 'db3.example.com' } },
]);
// Query single shard
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
Critical Rules
✓ Always Do
| Rule | Reason |
|---|---|
| Include shard key in queries | Avoid scanning all shards (100x slower) |
| Monitor shard distribution | Detect hotspots before they cause outages |
| Plan for rebalancing upfront | Cannot easily add shards later |
| Choose immutable shard key | Changing key = data migration nightmare |
| Test distribution with production data | Synthetic data hides real hotspots |
| Denormalize for data locality | Keep related data on same shard |
✗ Never Do
| Anti-Pattern | Why It's Bad |
|---|---|
| Sequential ID with range sharding | Latest shard gets all writes (hotspot) |
| Timestamp as shard key | Recent shard overwhelmed |
| Cross-shard transactions without 2PC | Data corruption, inconsistency |
| Simple modulo without consistent hashing | Cannot add shards without full re-shard |
| Nullable shard key | Special NULL handling creates hotspots |
| No shard routing layer | Hardcoded shards = cannot rebalance |
Top 7 Critical Errors
Error 1: Wrong Shard Key Choice (Hotspots)
Symptom: One shard receives 80%+ of traffic Fix:
// ❌ Bad: Low cardinality (status field)
shard_key = order.status; // 90% are 'pending' → shard_0 overloaded
// ✅ Good: High cardinality (user_id)
shard_key = order.user_id; // Millions of users, even distribution
Error 2: Missing Shard Key in Queries
Symptom: Queries scan ALL shards (extremely slow) Fix:
// ❌ Bad: No shard key
SELECT * FROM orders WHERE status = 'shipped'; // Scans all 100 shards!
// ✅ Good: Include shard key
SELECT * FROM orders WHERE user_id = ? AND status = 'shipped'; // Targets 1 shard
Error 3: Sequential IDs with Range Sharding
Symptom: Latest shard gets all writes Fix:
// ❌ Bad: Range sharding with auto-increment
// Shard 0: 1-1M, Shard 1: 1M-2M, Shard 2: 2M+ → All new writes to Shard 2!
// ✅ Good: Hash-based sharding
const shardId = hash(id) % shardCount; // Even distribution
Error 4: No Rebalancing Strategy
Symptom: Stuck with initial shard count, cannot scale Fix:
// ❌ Bad: Simple modulo
const shardId = hash(key) % shardCount; // Adding 5th shard breaks ALL keys
// ✅ Good: Consistent hashing
const ring = new ConsistentHashRing(shards);
const shardId = ring.getNode(key); // Only ~25% of keys move when adding shard
Error 5: Cross-Shard Transactions
Symptom: Data inconsistency, partial writes Fix:
// ❌ Bad: Cross-shard transaction (will corrupt)
BEGIN;
UPDATE shard_1.accounts SET balance = balance - 100 WHERE id = 'A';
UPDATE shard_2.accounts SET balance = balance + 100 WHERE id = 'B';
COMMIT; // If shard_2 fails, shard_1 already committed!
// ✅ Good: Two-Phase Commit or Saga pattern
const txn = new TwoPhaseCommitTransaction();
txn.addOperation(shard_1, 'UPDATE accounts SET balance = balance - 100 WHERE id = ?', ['A']);
txn.addOperation(shard_2, 'UPDATE accounts SET balance = balance + 100 WHERE id = ?', ['B']);
await txn.execute(); // Atomic across shards
Error 6: Mutable Shard Key
Symptom: Records move shards, causing duplicates Fix:
// ❌ Bad: Shard by country (user relocates)
shard_key = user.country; // User moves US → CA, now in different shard!
// ✅ Good: Shard by immutable user_id
shard_key = user.id; // Never changes
Error 7: No Monitoring
Symptom: Silent hotspots, sudden performance degradation Fix:
// ✅ Required metrics
- Per-shard record counts (should be within 20%)
- Query distribution (no shard > 40% of queries)
- Storage per shard (alert at 80%)
- Latency p99 per shard
Load references/error-catalog.md for all 10 errors with detailed fixes.
Sharding Strategies
| Strategy | Best For | Pros | Cons |
|---|---|---|---|
| Hash | User data, even load critical | No hotspots, predictable | Range queries scatter |
| Range | Time-series, logs, append-only | Range queries efficient, archival | Recent shard hotspot |
| Directory | Multi-tenancy, complex routing | Flexible, easy rebalancing | Lookup overhead, SPOF |
Load references/sharding-strategies.md for detailed comparisons with production examples (Instagram, Discord, Salesforce).
Shard Key Selection Criteria
| Criterion | Importance | Check Method |
|---|---|---|
| High cardinality | Critical | COUNT(DISTINCT shard_key) > shard_count × 100 |
| Even distribution | Critical | No value > 5% of total |
| Immutable | Critical | Value never changes |
| Query alignment | High | 80%+ queries include it |
| Data locality | Medium | Related records together |
Decision Tree:
- User-focused app →
user_id - Multi-tenant SaaS →
tenant_id - Time-series/logs →
timestamp(range sharding) - Product catalog →
product_id
Load references/shard-key-selection.md for comprehensive decision trees and testing strategies.
Configuration Summary
Hash-Based Router
import { HashRouter } from './templates/hash-router';
const router = new HashRouter([
{ id: 'shard_0', connection: { /* PostgreSQL config */ } },
{ id: 'shard_1', connection: { /* PostgreSQL config */ } },
]);
// Automatically routes to correct shard
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
Range-Based Router
import { RangeRouter } from './templates/range-router';
const router = new RangeRouter(shardConfigs, [
{ start: Date.parse('2024-01-01'), end: Date.parse('2024-04-01'), shardId: 'shard_q1' },
{ start: Date.parse('2024-04-01'), end: Date.parse('2024-07-01'), shardId: 'shard_q2' },
{ start: Date.parse('2024-07-01'), end: Infinity, shardId: 'shard_q3' },
]);
// Range queries target specific shards
const janEvents = await router.queryRange(
Date.parse('2024-01-01'),
Date.parse('2024-02-01'),
'SELECT * FROM events WHERE created_at BETWEEN $1 AND $2'
);
Directory-Based Router
import { DirectoryRouter } from './templates/directory-router';
const router = new DirectoryRouter(directoryDBConfig, shardConfigs);
// Assign tenant to specific shard
await router.assignShard('tenant_acme', 'shard_enterprise');
// Route automatically
const users = await router.query('tenant_acme', 'SELECT * FROM users');
When to Load References
Choosing Strategy
Load references/sharding-strategies.md when:
- Deciding between hash, range, directory
- Need production examples (Instagram, Discord)
- Planning hybrid approaches
Selecting Shard Key
Load references/shard-key-selection.md when:
- Choosing shard key for new project
- Evaluating existing shard key
- Testing distribution with production data
Implementation
Load references/implementation-patterns.md when:
- Building shard router from scratch
- Implementing consistent hashing
- Need transaction handling (2PC, Saga)
- Setting up monitoring/metrics
Cross-Shard Operations
Load references/cross-shard-queries.md when:
- Need to aggregate across shards (COUNT, SUM, AVG)
- Implementing cross-shard joins
- Building pagination across shards
- Optimizing scatter-gather patterns
Rebalancing
Load references/rebalancing-guide.md when:
- Adding new shards
- Migrating data between shards
- Planning zero-downtime migrations
- Balancing uneven load
Error Prevention
Load references/error-catalog.md when:
- Troubleshooting performance issues
- Reviewing shard architecture
- All 10 documented errors with fixes
Complete Setup Checklist
Before Sharding:
- Tested shard key distribution with production data
- Shard key in 80%+ of queries
- Monitoring infrastructure ready
- Rebalancing strategy planned
Router Implementation:
- Shard routing layer (not hardcoded shards)
- Connection pooling per shard
- Error handling and retries
- Metrics collection (queries/shard, latency)
Shard Configuration:
- 4-8 shards initially (room to grow)
- Consistent hashing or virtual shards
- Replicas per shard (HA)
- Backup strategy per shard
Application Changes:
- All queries include shard key
- Cross-shard joins eliminated (denormalized)
- Transaction boundaries respected
- Connection pooling configured
Production Example
Before (Single database overwhelmed):
// Single PostgreSQL instance
const db = new Pool({ host: 'db.example.com' });
// All 10M users on one server
const users = await db.query('SELECT * FROM users WHERE status = $1', ['active']);
// Query time: 5000ms (slow!)
// DB CPU: 95%
// Disk: 500GB, growing
After (Sharded across 8 servers):
// Hash-based sharding with 8 shards
const router = new HashRouter([
{ id: 'shard_0', connection: { host: 'db0.example.com' } },
{ id: 'shard_1', connection: { host: 'db1.example.com' } },
// ... 6 more shards
]);
// Query single user (targets 1 shard)
const user = await router.query('user_123', 'SELECT * FROM users WHERE id = $1', ['user_123']);
// Query time: 10ms (500x faster!)
// Query all shards (scatter-gather)
const allActive = await router.queryAll('SELECT * FROM users WHERE status = $1', ['active']);
// Query time: 800ms (parallelized across 8 shards, 6x faster than single)
// Result: Each shard handles ~1.25M users
// DB CPU per shard: 20%
// Disk per shard: 65GB
// Can scale to 16 shards easily (consistent hashing)
Known Issues Prevention
All 10 documented errors prevented:
- ✅ Wrong shard key (hotspots) → Test distribution first
- ✅ Missing shard key in queries → Code review, linting
- ✅ Cross-shard transactions → Use 2PC or Saga pattern
- ✅ Sequential ID hotspots → Use hash-based sharding
- ✅ No rebalancing strategy → Consistent hashing from day 1
- ✅ Timestamp sharding hotspots → Hybrid hash+range approach
- ✅ Mutable shard key → Choose immutable keys (user_id)
- ✅ No routing layer → Abstract with router from start
- ✅ No monitoring → Track per-shard metrics
- ✅ Weak hash function → Use MD5, MurmurHash3, xxHash
See: references/error-catalog.md for detailed fixes
Resources
Templates:
templates/hash-router.ts- Hash-based shardingtemplates/range-router.ts- Range-based shardingtemplates/directory-router.ts- Directory-based shardingtemplates/cross-shard-aggregation.ts- Aggregation patterns
References:
references/sharding-strategies.md- Strategy comparisonreferences/shard-key-selection.md- Key selection guidereferences/implementation-patterns.md- Router implementationsreferences/cross-shard-queries.md- Query patternsreferences/rebalancing-guide.md- Migration strategiesreferences/error-catalog.md- All 10 errors documented
Production Examples:
- Instagram: Range sharding for media
- Discord: Hash sharding for messages
- Salesforce: Directory sharding for orgs
Production-tested | 10 errors prevented | MIT License