| name | estimation-techniques |
| description | Back-of-envelope calculations for system design. Use when estimating QPS, storage, bandwidth, or latency for capacity planning. Includes latency numbers every programmer should know and common estimation patterns. |
| allowed-tools | Read, Glob, Grep |
Estimation Techniques
This skill provides frameworks for back-of-envelope calculations essential for system design and capacity planning.
When to Use This Skill
Keywords: back-of-envelope, estimation, QPS, storage calculation, bandwidth, latency, capacity planning, scale estimation
Use this skill when:
- Estimating system capacity requirements
- Calculating storage needs for a feature
- Determining bandwidth requirements
- Sizing infrastructure for expected load
- Justifying architectural decisions with numbers
- Preparing for system design interviews
Core Principle
Estimation is not about precision, it's about order of magnitude.
Getting within 10x is usually good enough for architectural decisions. The goal is to identify if you need:
- 1 server or 100 servers
- 1 GB or 1 TB of storage
- 10 ms or 1 second latency
Essential Numbers to Know
Powers of 2
| Power | Value | Approximate |
|---|---|---|
| 2^10 | 1,024 | ~1 Thousand (KB) |
| 2^20 | 1,048,576 | ~1 Million (MB) |
| 2^30 | 1,073,741,824 | ~1 Billion (GB) |
| 2^40 | 1,099,511,627,776 | ~1 Trillion (TB) |
Time Conversions
| Unit | Seconds | Useful For |
|---|---|---|
| 1 minute | 60 | Short operations |
| 1 hour | 3,600 | Batch jobs |
| 1 day | 86,400 (~100K) | Daily aggregations |
| 1 month | 2,592,000 (~2.5M) | Monthly calculations |
| 1 year | 31,536,000 (~30M) | Annual projections |
Availability Targets
| Availability | Downtime/Year | Downtime/Month | Downtime/Day |
|---|---|---|---|
| 99% (two 9s) | 3.65 days | 7.31 hours | 14.4 min |
| 99.9% (three 9s) | 8.76 hours | 43.8 min | 1.44 min |
| 99.99% (four 9s) | 52.6 min | 4.38 min | 8.64 sec |
| 99.999% (five 9s) | 5.26 min | 26.3 sec | 864 ms |
Latency Numbers Every Programmer Should Know
See full reference: references/latency-numbers.md
Quick reference:
| Operation | Latency | Relative |
|---|---|---|
| L1 cache reference | 0.5 ns | 1x |
| L2 cache reference | 7 ns | 14x |
| Main memory reference | 100 ns | 200x |
| SSD random read | 16 us | 32,000x |
| HDD seek | 2 ms | 4,000,000x |
| Round trip same datacenter | 0.5 ms | 1,000,000x |
| Round trip CA to Netherlands | 150 ms | 300,000,000x |
Estimation Patterns
Pattern 1: QPS (Queries Per Second)
Formula
QPS = (Number of Users) x (Actions per User per Day) / (Seconds per Day)
Example: Twitter-like service
Given:
- 300 million monthly active users
- 50% are daily active = 150M DAU
- Average user reads 20 tweets/day
QPS = 150M * 20 / 86,400
= 3 billion / 100,000
= 30,000 QPS
Peak load (typically 2-3x average):
Peak QPS = 30,000 * 3 = 90,000 QPS
Pattern 2: Storage Estimation
Formula
Storage = (Number of Items) x (Size per Item) x (Replication Factor) x (Time Period)
Example: Photo storage service
Given:
- 100 million users
- 10% upload daily = 10M uploads/day
- Average photo size = 2 MB
- Keep 5 years of data
- Replication factor = 3
Daily storage = 10M * 2 MB = 20 TB
Yearly storage = 20 TB * 365 = 7.3 PB
5-year storage = 7.3 * 5 = 36.5 PB
With replication = 36.5 * 3 = ~110 PB
Pattern 3: Bandwidth Estimation
Formula
Bandwidth = (QPS) x (Request Size or Response Size)
Example: Video streaming service
Given:
- 1 million concurrent viewers
- Average bitrate = 5 Mbps
- Peak hours: 8 PM - 11 PM
Bandwidth = 1M * 5 Mbps = 5 Tbps
With 20% overhead: ~6 Tbps
CDN egress cost (rough):
$0.02/GB * 6 Tbps * 3 hours * 3600 sec/hour / 8 bits/byte
= massive cost (hence why Netflix built their own CDN)
Pattern 4: Cache Size Estimation
Formula
Cache Size = (QPS) x (Cache TTL) x (Response Size) x (Unique Ratio)
Example: API response cache
Given:
- 10,000 QPS
- Cache TTL = 5 minutes = 300 seconds
- Average response = 10 KB
- 20% of requests are unique
Cache entries = 10,000 * 300 * 0.20 = 600,000 entries
Cache size = 600,000 * 10 KB = 6 GB
With overhead (keys, metadata): ~10 GB
Pattern 5: Database Sizing
Formula
DB Size = (Number of Rows) x (Row Size) x (Index Overhead) x (Replication)
Example: User profile database
Given:
- 500 million users
- Average profile = 1 KB (name, email, settings, etc.)
- Index overhead = 30%
- Primary + 2 replicas = 3x
Data size = 500M * 1 KB = 500 GB
With indexes = 500 GB * 1.3 = 650 GB
With replication = 650 GB * 3 = ~2 TB
Memory for hot data (20%): ~400 GB
Common Estimation Scenarios
Scenario 1: URL Shortener
Requirements:
- 100M new URLs/month
- 10:1 read:write ratio
Writes:
- 100M / (30 * 24 * 3600) = ~40 writes/second
- Peak: ~100 writes/second
Reads:
- 40 * 10 = 400 reads/second
- Peak: ~1000 reads/second
Storage (5 years):
- 100M URLs/month * 60 months = 6 billion URLs
- Average URL = 100 bytes (short) + 500 bytes (long) = 600 bytes
- 6B * 600 bytes = 3.6 TB
- With indexes and overhead: ~5 TB
Scenario 2: Chat Application
Requirements:
- 10M daily active users
- Average 50 messages sent/day
- Average 200 messages received/day
Message throughput:
- Sends: 10M * 50 / 86,400 = ~6,000 messages/second
- Peak: ~20,000 messages/second
Connections:
- Each user maintains 1-3 connections (phone, laptop, tablet)
- Peak concurrent: 10M * 0.1 (10% online) * 2 = 2M connections
Storage (1 year):
- 10M users * 50 msgs/day * 365 days = 182B messages/year
- Average message = 200 bytes
- 182B * 200 bytes = 36.4 TB/year
Scenario 3: Video Streaming
Requirements:
- 100M monthly active users
- 30% watch daily = 30M DAU
- Average 1 hour/day viewing
Concurrent viewers (peak):
- 30M DAU / 24 hours * 3 (peak factor) = ~4M concurrent
Bandwidth:
- Average stream: 5 Mbps
- 4M * 5 Mbps = 20 Tbps peak bandwidth
Storage (library of 10K titles):
- Average video = 2 hours
- Multiple qualities: 480p (1GB), 720p (3GB), 1080p (5GB), 4K (20GB)
- Per title: ~30 GB
- Library: 10K * 30 GB = 300 TB
Estimation Tips
Round Aggressively
Instead of: Use:
86,400 seconds ~100,000 (10^5)
2.5 million ~3 million
7.3 petabytes ~10 petabytes
Use Orders of Magnitude
Think in powers of 10:
- Thousands (10^3)
- Millions (10^6)
- Billions (10^9)
- Trillions (10^12)
State Your Assumptions
Always verbalize:
- "I'm assuming 10% of users are active at peak"
- "I'm estimating average message size at 200 bytes"
- "I'm using a 3x replication factor"
Sanity Check Results
After calculating, ask:
- "Does this make sense?"
- "Is this in the right order of magnitude?"
- "What would change if my assumption is off by 10x?"
Common Mistakes
Mistake 1: Ignoring Peak vs Average
Problem
Sizing for average load.
Average QPS: 10,000
Peak QPS: 30,000 (often 2-3x average)
If you size for 10,000, you'll fail at peak.
Mistake 2: Forgetting Replication
Problem
Calculating raw storage without copies.
Data: 1 TB
With 3 replicas: 3 TB
With backups: 4-5 TB
Mistake 3: Not Accounting for Growth
Problem
Sizing for current, not future.
Current users: 10M
Expected growth: 50%/year
Year 3: 10M * 1.5^3 = 34M users
Size for at least 2x current to avoid near-term issues.
Mistake 4: Over-Precision
Problem
Calculating to 3 decimal places.
Bad: "We need exactly 3,456,789 IOPS"
Good: "We need roughly 3-4 million IOPS"
Quick Reference Calculations
| Need | Formula |
|---|---|
| QPS | users * actions/day / 86400 |
| Storage/day | items/day * size/item |
| Bandwidth | QPS * response_size |
| Cache hit rate | 1 - (DB_QPS / total_QPS) |
| Servers needed | QPS / QPS_per_server |
| Shards needed | data_size / max_shard_size |
Related Skills
design-interview-methodology- Overall interview frameworkquality-attributes-taxonomy- NFR definitions (scalability, performance)database-scaling- Database capacity planning (Phase 3)caching-strategies- Cache sizing and hit rates (Phase 3)
Related Commands
/sd:estimate <scenario>- Calculate capacity interactively
Related Agents
capacity-planner- Guided estimation with calculations
References
references/latency-numbers.md- Complete latency reference table
Version History
- v1.0.0 (2025-12-26): Initial release
Last Updated
Date: 2025-12-26 Model: claude-opus-4-5-20251101