name	performance-testing-fundamentals
description	Use when starting performance testing, choosing load testing tools, interpreting performance metrics, debugging slow applications, or establishing performance baselines - provides decision frameworks and anti-patterns for load, stress, spike, and soak testing

Performance Testing Fundamentals

Overview

Core principle: Diagnose first, test second. Performance testing without understanding your bottlenecks wastes time.

Rule: Define SLAs before testing. You can't judge "good" performance without requirements.

When NOT to Performance Test

Performance test only AFTER:

✅ Defining performance SLAs (latency, throughput, error rate targets)
✅ Profiling current bottlenecks (APM, database logs, profiling)
✅ Fixing obvious issues (missing indexes, N+1 queries, inefficient algorithms)

Don't performance test to find problems - use profiling/APM for that. Performance test to verify fixes and validate capacity.

Tool Selection Decision Tree

Your Constraint	Choose	Why
CI/CD integration, JavaScript team	k6	Modern, code-as-config, easy CI integration
Complex scenarios, enterprise, mature ecosystem	JMeter	GUI, plugins, every protocol
High throughput (10k+ RPS), Scala team	Gatling	Built for scale, excellent reports
Quick HTTP benchmark, no complex scenarios	Apache Bench (ab) or wrk	Command-line, no setup
Cloud-based, don't want infrastructure	BlazeMeter, Loader.io	SaaS, pay-per-use
Realistic browser testing (JS rendering)	Playwright + k6	Hybrid: Playwright for UX, k6 for load

For most teams: k6 (modern, scriptable) or JMeter (mature, GUI)

Test Type Quick Reference

Test Type	Purpose	Duration	Load Pattern	Use When
Load Test	Verify normal operations under expected load	15-30 min	Steady (ramp to target, sustain)	Baseline validation, regression testing
Stress Test	Find breaking point	5-15 min	Increasing (ramp until failure)	Capacity planning, finding limits
Spike Test	Test sudden traffic surge	2-5 min	Instant jump (0 → peak)	Black Friday prep, auto-scaling validation
Soak Test	Find memory leaks, connection pool exhaustion	2-8 hours	Steady sustained load	Pre-production validation, stability check

Start with Load Test (validates baseline), then Stress/Spike (finds limits), finally Soak (validates stability).

Anti-Patterns Catalog

❌ Premature Load Testing

Symptom: "App is slow, let's load test it"

Why bad: Load testing reveals "it's slow under load" but not WHY or WHERE

Fix: Profile first (APM, database slow query logs, profiler), fix obvious bottlenecks, THEN load test to validate

❌ Testing Without SLAs

Symptom: "My API handles 100 RPS with 200ms average latency. Is that good?"

Why bad: Can't judge "good" without requirements. A gaming API needs <50ms; batch processing tolerates 2s.

Fix: Define SLAs first:

Target latency: P95 < 300ms, P99 < 500ms
Target throughput: 500 RPS at peak
Max error rate: < 0.1%

❌ Unrealistic SLAs

Symptom: "Our database-backed CRUD API with complex joins must have P95 < 10ms"

Why bad: Sets impossible targets. Database round-trip alone is often 5-20ms. Forces wasted optimization or architectural rewrites.

Fix: Compare against Performance Benchmarks table (see below). If target is 10x better than benchmark, profile current performance first, then negotiate realistic SLA based on what's achievable vs cost of optimization.

❌ Vanity Metrics

Symptom: Reporting only average response time

Why bad: Average hides tail latency. 99% of requests at 100ms + 1% at 10s = "average 200ms" looks fine, but users experience 10s delays.

Fix: Always report percentiles:

P50 (median) - typical user experience
P95 - most users
P99 - worst-case for significant minority
Max - outliers

❌ Load Testing in Production First

Symptom: "Let's test capacity by running load tests against production"

Why bad: Risks outages, contaminates real metrics, can trigger alerts/costs

Fix: Test in staging environment that mirrors production (same DB size, network latency, resource limits)

❌ Single-User "Load" Tests

Symptom: Running one user hitting the API as fast as possible

Why bad: Doesn't simulate realistic concurrency, misses resource contention (database connections, thread pools)

Fix: Simulate realistic concurrent users with realistic think time between requests

Metrics Glossary

Metric	Definition	Good Threshold (typical web API)
RPS (Requests/Second)	Throughput - how many requests processed	Varies by app; know your peak
Latency	Time from request to response	P95 < 300ms, P99 < 500ms
P50 (Median)	50% of requests faster than this	P50 < 100ms
P95	95% of requests faster than this	P95 < 300ms
P99	99% of requests faster than this	P99 < 500ms
Error Rate	% of 4xx/5xx responses	< 0.1%
Throughput	Data transferred per second (MB/s)	Depends on payload size
Concurrent Users	Active users at same time	Calculate from traffic patterns

Focus on P95/P99, not average. Tail latency kills user experience.

Diagnostic-First Workflow

Before load testing slow applications, follow this workflow:

Step 1: Measure Current State

Install APM (DataDog, New Relic, Grafana) or logging
Identify slowest 10 endpoints/operations
Check database slow query logs

Step 2: Common Quick Wins (90% of performance issues)

Missing database indexes
N+1 query problem
Unoptimized images/assets
Missing caching (Redis, CDN)
Synchronous operations that should be async
Inefficient serialization (JSON parsing bottlenecks)

Step 3: Profile Specific Bottleneck

Use profiler to see CPU/memory hotspots
Trace requests to find where time is spent (DB? external API? computation?)
Check for resource limits (max connections, thread pool exhaustion)

Step 4: Fix and Measure

Apply fix (add index, cache layer, async processing)
Measure improvement in production
Document before/after metrics

Step 5: THEN Load Test (if needed)

Validate fixes handle expected load
Find new capacity limits
Establish regression baseline

Anti-pattern to avoid: Skipping to Step 5 without Steps 1-4.

Performance Benchmarks (Reference)

What "good" looks like by application type:

Application Type	Typical P95 Latency	Typical Throughput	Notes
REST API (CRUD)	< 200ms	500-2000 RPS	Database-backed, simple queries
Search API	< 500ms	100-500 RPS	Complex queries, ranking algorithms
Payment Gateway	< 1s	50-200 RPS	External service calls, strict consistency
Real-time Gaming	< 50ms	1000-10000 RPS	Low latency critical
Batch Processing	2-10s/job	10-100 jobs/min	Throughput > latency
Static CDN	< 100ms	10000+ RPS	Edge-cached, minimal computation

Use as rough guide, not absolute targets. Your SLAs depend on user needs.

Results Interpretation Framework

After running a load test:

Pass Criteria:

✅ All requests meet latency SLA (e.g., P95 < 300ms)
✅ Error rate under threshold (< 0.1%)
✅ No resource exhaustion (CPU < 80%, memory stable, no connection pool saturation)
✅ Sustained load for test duration without degradation

Fail Criteria:

❌ Latency exceeds SLA
❌ Error rate spikes
❌ Gradual degradation over time (memory leak, connection leak)
❌ Resource exhaustion (CPU pegged, OOM errors)

Next Steps:

If passing: Establish this as regression baseline, run periodically in CI
If failing: Profile to find bottleneck, optimize, re-test
If borderline: Test at higher load (stress test) to find safety margin

Common Mistakes

❌ Not Ramping Load Gradually

Symptom: Instant 0 → 1000 users, everything fails

Fix: Ramp over 2-5 minutes to let auto-scaling/caching warm up (except spike tests, where instant jump is the point)

❌ Testing With Empty Database

Symptom: Tests pass with 100 records, fail with 1M records in production

Fix: Seed staging database with production-scale data

❌ Ignoring External Dependencies

Symptom: Your API is fast, but third-party payment gateway times out under load

Fix: Include external service latency in SLAs, or mock them for isolated API testing

Quick Reference

Getting Started Checklist:

Define SLAs (latency P95/P99, throughput, error rate)
Choose tool (k6 or JMeter for most cases)
Start with Load Test (baseline validation)
Run Stress Test (find capacity limits)
Establish regression baseline
Run in CI on major changes

When Debugging Slow App:

Profile first (APM, database logs)
Fix obvious issues (indexes, N+1, caching)
Measure improvement
THEN load test to validate

Interpreting Results:

Report P95/P99, not just average
Compare against SLAs
Check for resource exhaustion
Look for degradation over time (soak tests)

Bottom Line

Performance testing validates capacity and catches regressions.

Profiling finds bottlenecks.

Don't confuse the two - diagnose first, test second.

performance-testing-fundamentals

Install Skill

SKILL.md