| name | performance-profiling |
| description | Measurement approaches, profiling tools, optimization patterns, and capacity planning. Use when diagnosing performance issues, establishing baselines, identifying bottlenecks, or planning for scale. Always measure before optimizing. |
Performance Profiling
When to Use
- Establishing performance baselines before optimization
- Diagnosing slow response times, high CPU, or memory issues
- Identifying bottlenecks in application, database, or infrastructure
- Planning capacity for expected load increases
- Validating performance improvements after optimization
- Creating performance budgets for new features
Core Methodology
The Golden Rule: Measure First
Never optimize based on assumptions. Follow this order:
- Measure - Establish baseline metrics
- Identify - Find the actual bottleneck
- Hypothesize - Form a theory about the cause
- Fix - Implement targeted optimization
- Validate - Measure again to confirm improvement
- Document - Record findings and decisions
Profiling Hierarchy
Profile at the right level to find the actual bottleneck:
Application Level
|-- Request/Response timing
|-- Function/Method profiling
|-- Memory allocation tracking
|
System Level
|-- CPU utilization per process
|-- Memory usage patterns
|-- I/O wait times
|-- Network latency
|
Infrastructure Level
|-- Database query performance
|-- Cache hit rates
|-- External service latency
|-- Resource saturation
Profiling Patterns
CPU Profiling
Identify what code consumes CPU time:
- Sampling profilers - Low overhead, statistical accuracy
- Instrumentation profilers - Exact counts, higher overhead
- Flame graphs - Visual representation of call stacks
Key metrics:
- Self time (time in function itself)
- Total time (self time + time in called functions)
- Call count and frequency
Memory Profiling
Track allocation patterns and detect leaks:
- Heap snapshots - Point-in-time memory state
- Allocation tracking - What allocates memory and when
- Garbage collection analysis - GC frequency and duration
Key metrics:
- Heap size over time
- Object retention
- Allocation rate
- GC pause times
I/O Profiling
Measure disk and network operations:
- Disk I/O - Read/write latency, throughput, IOPS
- Network I/O - Latency, bandwidth, connection count
- Database I/O - Query time, connection pool usage
Key metrics:
- Latency percentiles (p50, p95, p99)
- Throughput (ops/sec, MB/sec)
- Queue depth and wait times
Bottleneck Identification
The USE Method
For each resource, check:
- Utilization - Percentage of time resource is busy
- Saturation - Degree of queued work
- Errors - Error count for the resource
The RED Method
For services, measure:
- Rate - Requests per second
- Errors - Failed requests per second
- Duration - Distribution of request latencies
Common Bottleneck Patterns
| Pattern | Symptoms | Typical Causes |
|---|---|---|
| CPU-bound | High CPU, low I/O wait | Inefficient algorithms, tight loops |
| Memory-bound | High memory, GC pressure | Memory leaks, large allocations |
| I/O-bound | Low CPU, high I/O wait | Slow queries, network latency |
| Lock contention | Low CPU, high wait time | Synchronization, connection pools |
| N+1 queries | Many small DB queries | Missing joins, lazy loading |
Amdahl's Law
Optimization impact is limited by the fraction of time affected:
If 90% of time is in function A and 10% in function B:
- Optimizing A by 50% = 45% total improvement
- Optimizing B by 50% = 5% total improvement
Focus on the biggest contributors first.
Capacity Planning
Baseline Establishment
Measure current capacity under production load:
- Peak load metrics - Maximum concurrent users, requests/sec
- Resource headroom - How close to limits at peak
- Scaling patterns - Linear, sub-linear, or super-linear
Load Testing Approach
- Establish baseline - Current performance at normal load
- Ramp testing - Gradually increase load to find limits
- Stress testing - Push beyond limits to understand failure modes
- Soak testing - Sustained load to find memory leaks, degradation
Capacity Metrics
| Metric | What It Tells You |
|---|---|
| Throughput at saturation | Maximum system capacity |
| Latency at 80% load | Performance before degradation |
| Error rate under stress | Failure patterns |
| Recovery time | How quickly system returns to normal |
Growth Planning
Required Capacity = (Current Load x Growth Factor) + Safety Margin
Example:
- Current: 1000 req/sec
- Expected growth: 50% per year
- Safety margin: 30%
Year 1 need = (1000 x 1.5) x 1.3 = 1950 req/sec
Optimization Patterns
Quick Wins
- Enable caching - Application, CDN, database query cache
- Add indexes - For slow queries identified in profiling
- Compression - Gzip/Brotli for responses
- Connection pooling - Reduce connection overhead
- Batch operations - Reduce round-trips
Algorithmic Improvements
- Reduce complexity - O(n^2) to O(n log n)
- Lazy evaluation - Defer work until needed
- Memoization - Cache computed results
- Pagination - Limit data processed at once
Architectural Changes
- Horizontal scaling - Add more instances
- Async processing - Queue background work
- Read replicas - Distribute read load
- Caching layers - Redis, Memcached
- CDN - Edge caching for static content
Best Practices
- Profile in production-like environments; development can have different characteristics
- Use percentiles (p95, p99) not averages for latency
- Monitor continuously, not just during incidents
- Set performance budgets and enforce them in CI
- Document baseline metrics before making changes
- Keep profiling overhead low in production
- Correlate metrics across layers (application, database, infrastructure)
- Understand the difference between latency and throughput
Anti-Patterns
- Optimizing without measurement
- Using averages for latency metrics
- Profiling only in development
- Ignoring tail latencies (p99, p999)
- Premature optimization of non-bottleneck code
- Over-engineering for hypothetical scale
- Caching without invalidation strategy
References
- Profiling Tools Reference - Tools by language and platform