Optimizing Performance
Strategies for identifying, analyzing, and resolving performance bottlenecks.
When to Use This Skill
- Application is running slowly
- High resource consumption (CPU, memory)
- Database queries are slow
- API response times are high
- Need to scale for more users
- Preparing for load testing
Performance Optimization Philosophy
The Golden Rules
- Measure first - Never optimize without data
- Optimize the right thing - Find the actual bottleneck
- Keep it simple - Complexity often hurts performance
- Test after - Verify the optimization worked
- Document trade-offs - Performance often costs readability
The 80/20 Rule
80% of performance problems come from 20% of the code.
Focus on:
├── Hot paths (frequently executed code)
├── I/O operations (database, network, disk)
├── Memory allocation patterns
└── Algorithm complexity
Profiling Techniques
Types of Profiling
| Type |
What It Measures |
Tools |
| CPU Profiling |
Time spent in functions |
pprof, py-spy, Chrome DevTools |
| Memory Profiling |
Allocation patterns, leaks |
Valgrind, memory_profiler, Chrome |
| I/O Profiling |
Disk/network operations |
strace, perf, Wireshark |
| Database Profiling |
Query performance |
EXPLAIN, slow query log, APM |
Profiling Workflow
1. Establish baseline
└─ Measure current performance with realistic load
2. Identify hotspots
└─ Profile to find where time/resources are spent
3. Form hypothesis
└─ Why is this slow? What would make it faster?
4. Implement fix
└─ Make ONE change at a time
5. Measure again
└─ Did it help? By how much?
6. Repeat
└─ Until performance goals are met
Common Profiling Commands
# Node.js
node --prof app.js
node --prof-process isolate-*.log > profile.txt
# Python
python -m cProfile -s cumtime app.py
py-spy record -o profile.svg -- python app.py
# Go
go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
go tool pprof cpu.prof
# Database (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
Common Bottleneck Patterns
N+1 Query Problem
BAD (N+1 queries):
SELECT * FROM posts; -- 1 query
SELECT * FROM users WHERE id=1; -- N queries
SELECT * FROM users WHERE id=2;
...
GOOD (2 queries):
SELECT * FROM posts;
SELECT * FROM users WHERE id IN (1, 2, 3, ...);
Detection: High query count relative to data returned
Fix: Eager loading, batch fetching, JOINs
Unbounded Operations
BAD:
SELECT * FROM logs; -- Returns millions of rows
GOOD:
SELECT * FROM logs
WHERE created_at > NOW() - INTERVAL '1 day'
LIMIT 100;
Detection: Memory spikes, timeouts
Fix: Pagination, limits, streaming
Synchronous Blocking
BAD (blocking):
result1 = fetch_api_1() -- Wait 200ms
result2 = fetch_api_2() -- Wait 200ms
return combine(result1, result2) -- Total: 400ms
GOOD (parallel):
[result1, result2] = await Promise.all([
fetch_api_1(),
fetch_api_2()
]) -- Total: ~200ms
Detection: Sequential I/O in traces
Fix: Parallel execution, async/await
Excessive Allocation
BAD (allocates in loop):
for item in large_list:
result = [] # Allocates each iteration
result.append(transform(item))
GOOD (pre-allocate):
result = []
for item in large_list:
result.append(transform(item))
BEST (generator):
def transform_all(items):
for item in items:
yield transform(item)
Detection: GC pressure, memory profiling
Fix: Object pooling, pre-allocation, generators
Optimization Techniques
Database Optimization
| Technique |
When to Use |
Impact |
| Indexing |
Slow WHERE/JOIN queries |
High |
| Query optimization |
Complex queries |
High |
| Connection pooling |
Many short connections |
Medium |
| Read replicas |
Read-heavy workloads |
High |
| Caching |
Repeated queries |
Very High |
| Denormalization |
Complex JOINs |
Medium |
Index Guidelines
-- Create index for frequently queried columns
CREATE INDEX idx_users_email ON users(email);
-- Composite index for multiple column queries
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Check if index is used
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
Caching Strategies
| Strategy |
Use Case |
Invalidation |
| Cache-aside |
General purpose |
Manual or TTL |
| Write-through |
Strong consistency |
On write |
| Write-behind |
Write-heavy |
Async batched |
| Read-through |
Read-heavy |
On miss |
Cache-aside pattern:
1. Check cache
2. If miss, query database
3. Store in cache
4. Return result
Memory Optimization
| Technique |
When to Use |
| Object pooling |
Frequent allocation of same type |
| Lazy loading |
Large objects not always needed |
| Streaming |
Processing large datasets |
| Weak references |
Cache that can be evicted |
| Data structure choice |
Right structure for access pattern |
Frontend Performance
Core Web Vitals
| Metric |
Target |
What It Measures |
| LCP (Largest Contentful Paint) |
< 2.5s |
Load performance |
| INP (Interaction to Next Paint) |
< 200ms |
Interactivity |
| CLS (Cumulative Layout Shift) |
< 0.1 |
Visual stability |
Frontend Optimization Checklist
Loading Performance:
☐ Code splitting (lazy load routes/components)
☐ Tree shaking (remove unused code)
☐ Minification (JS, CSS)
☐ Compression (gzip, brotli)
☐ Image optimization (WebP, srcset, lazy loading)
☐ CDN for static assets
Runtime Performance:
☐ Virtualized lists for large data
☐ Debounce/throttle event handlers
☐ Memoization of expensive computations
☐ Avoid layout thrashing (batch DOM reads/writes)
☐ Use CSS transforms for animations
☐ Web Workers for heavy computation
Bundle Optimization
# Analyze bundle size
npx webpack-bundle-analyzer stats.json
npx source-map-explorer bundle.js
# Identify large dependencies
npx depcheck
API Performance
Response Time Targets
| Percentile |
Target |
User Experience |
| p50 |
< 100ms |
Fast |
| p95 |
< 500ms |
Acceptable |
| p99 |
< 1s |
Tolerable |
API Optimization Techniques
| Technique |
Benefit |
| Response compression |
Reduce transfer size |
| Pagination |
Limit response size |
| Field selection |
Return only needed data |
| ETags/Caching headers |
Reduce redundant requests |
| Connection keep-alive |
Reduce handshake overhead |
| HTTP/2 |
Multiplexing, header compression |
Batch Endpoints
BAD (multiple requests):
GET /users/1
GET /users/2
GET /users/3
GOOD (batch):
POST /users/batch
{ "ids": [1, 2, 3] }
Monitoring and Alerting
Key Metrics to Track
| Category |
Metrics |
| Latency |
p50, p95, p99 response times |
| Throughput |
Requests per second |
| Errors |
Error rate, error types |
| Saturation |
CPU, memory, connections |
Alerting Thresholds
Critical (page immediately):
- Error rate > 5%
- p99 latency > 5s
- Service down
Warning (notify during hours):
- Error rate > 1%
- p95 latency > 2s
- Resource utilization > 80%
Logging for Performance
# Log slow operations
import time
import logging
def timed_operation(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
duration = time.time() - start
if duration > 1.0: # Log if > 1 second
logging.warning(f"{func.__name__} took {duration:.2f}s")
return result
return wrapper
Performance Testing
Load Testing Tools
| Tool |
Use Case |
| k6 |
Modern, scriptable load testing |
| JMeter |
Complex scenarios, GUI |
| Locust |
Python-based, distributed |
| Artillery |
YAML config, easy to start |
| wrk |
Simple HTTP benchmarking |
Load Test Example (k6)
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 }, // Ramp up
{ duration: '5m', target: 50 }, // Stay at 50 users
{ duration: '1m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% under 500ms
http_req_failed: ['rate<0.01'], // Error rate < 1%
},
};
export default function () {
const res = http.get('https://api.example.com/users');
check(res, { 'status is 200': (r) => r.status === 200 });
sleep(1);
}
Anti-Patterns to Avoid
- Premature optimization - Optimize only proven bottlenecks
- Optimizing without measuring - Guessing wastes time
- Over-caching - Cache invalidation is hard
- Ignoring database - Often the real bottleneck
- Complex micro-optimizations - Usually not worth it
- Not testing under load - Production behavior differs
- Ignoring cold starts - First request matters too
- Over-engineering - Simpler is often faster
Quick Reference
PROFILING FLOW:
Measure → Identify → Hypothesize → Fix → Measure → Repeat
COMMON BOTTLENECKS:
N+1 queries → Eager loading
Unbounded data → Pagination
Blocking I/O → Parallelization
Excessive allocation → Object pooling
DATABASE:
Index frequently queried columns
Use EXPLAIN ANALYZE
Add caching layer
CACHING:
Cache-aside for general use
TTL for time-based invalidation
Invalidate on write for consistency
TARGETS:
p50 < 100ms
p95 < 500ms
p99 < 1s
TOOLS:
CPU: pprof, py-spy
Memory: valgrind, memory_profiler
Load: k6, locust
DB: EXPLAIN, slow query log