name	load-testing-patterns
description	Use when designing load tests, choosing tools (k6, JMeter, Gatling), calculating concurrent users from DAU, interpreting latency degradation, identifying bottlenecks, or running spike/soak/stress tests - provides test patterns, anti-patterns, and load calculation frameworks

Load Testing Patterns

Overview

Core principle: Test realistic load patterns, not constant artificial load. Find limits before users do.

Rule: Load testing reveals system behavior under stress. Without it, production is your load test.

Tool Selection Decision Tree

Your Need	Protocol	Team Skills	Use	Why
Modern API testing	HTTP/REST/GraphQL	JavaScript	k6	Best dev experience, CI/CD friendly
Enterprise/complex protocols	HTTP/SOAP/JMS/JDBC	Java/GUI comfort	JMeter	Mature, comprehensive protocols
Python team	HTTP/WebSocket	Python	Locust	Pythonic, easy scripting
High performance/complex scenarios	HTTP/gRPC	Scala/Java	Gatling	Best reports, high throughput
Cloud-native at scale	HTTP/WebSocket	Any (SaaS)	Artillery, Flood.io	Managed, distributed

First choice: k6 (modern, scriptable, excellent CI/CD integration)

Why not ApacheBench/wrk: Too simple for realistic scenarios, no complex user flows

Test Pattern Library

Pattern	Purpose	Duration	When to Use
Smoke Test	Verify test works	1-2 min	Before every test run
Load Test	Normal/peak capacity	10-30 min	Regular capacity validation
Stress Test	Find breaking point	20-60 min	Understand limits
Spike Test	Sudden traffic surge	5-15 min	Black Friday, launch events
Soak Test	Memory leaks, stability	1-8 hours	Pre-release validation
Capacity Test	Max sustainable load	Variable	Capacity planning

Smoke Test

Goal: Verify test script works with minimal load

// k6 smoke test
export let options = {
  vus: 1,
  duration: '1m',
  thresholds: {
    http_req_duration: ['p(95)<500'],  // 95% < 500ms
    http_req_failed: ['rate<0.01'],     // <1% errors
  }
}

Purpose: Catch test script bugs before running expensive full tests

Load Test (Ramp-Up Pattern)

Goal: Test normal and peak expected load

// k6 load test with ramp-up
export let options = {
  stages: [
    { duration: '5m', target: 100 },   // Ramp to normal load
    { duration: '10m', target: 100 },  // Hold at normal
    { duration: '5m', target: 200 },   // Ramp to peak
    { duration: '10m', target: 200 },  // Hold at peak
    { duration: '5m', target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1000'],
    http_req_failed: ['rate<0.05'],
  }
}

Pattern: Gradual ramp-up → sustain → ramp down. Never start at peak.

Stress Test (Breaking Point)

Goal: Find system limits

// k6 stress test
export let options = {
  stages: [
    { duration: '5m', target: 100 },   // Normal
    { duration: '5m', target: 300 },   // Above peak
    { duration: '5m', target: 600 },   // 2x peak
    { duration: '5m', target: 900 },   // 3x peak (expect failure)
    { duration: '10m', target: 0 },    // Recovery
  ]
}

Success: Identify at what load system degrades (not necessarily breaking completely)

Spike Test (Sudden Surge)

Goal: Test sudden traffic bursts (viral post, email campaign)

// k6 spike test
export let options = {
  stages: [
    { duration: '1m', target: 100 },   // Normal
    { duration: '30s', target: 1000 }, // SPIKE to 10x
    { duration: '5m', target: 1000 },  // Hold spike
    { duration: '2m', target: 100 },   // Back to normal
    { duration: '5m', target: 100 },   // Recovery check
  ]
}

Tests: Auto-scaling, circuit breakers, rate limiting

Soak Test (Endurance)

Goal: Find memory leaks, resource exhaustion over time

// k6 soak test
export let options = {
  stages: [
    { duration: '5m', target: 100 },   // Ramp
    { duration: '4h', target: 100 },   // Soak (sustained load)
    { duration: '5m', target: 0 },     // Ramp down
  ]
}

Monitor: Memory growth, connection leaks, disk space, file descriptors

Duration: Minimum 1 hour, ideally 4-8 hours

Load Calculation Framework

Problem: Convert "10,000 daily active users" to concurrent load

Step 1: DAU to Concurrent Users

Concurrent Users = DAU × Concurrency Ratio × Peak Multiplier

Concurrency Ratios by App Type:
- Web apps: 5-10%
- Social media: 10-20%
- Business apps: 20-30% (work hours)
- Gaming: 15-25%

Peak Multiplier: 1.5-2x for safety margin

Example:

DAU = 10,000
Concurrency = 10% (web app)
Peak Multiplier = 1.5

Concurrent Users = 10,000 × 0.10 × 1.5 = 1,500 concurrent users

Step 2: Concurrent Users to Requests/Second

RPS = (Concurrent Users × Requests per Session) / (Session Duration × Think Time Ratio)

Think Time Ratio:
- Active browsing: 0.3-0.5 (30-50% time clicking/typing)
- Reading-heavy: 0.1-0.2 (10-20% active)
- API clients: 0.8-1.0 (80-100% active)

Example:

Concurrent Users = 1,500
Requests per Session = 20
Session Duration = 10 minutes = 600 seconds
Think Time Ratio = 0.3 (web browsing)

RPS = (1,500 × 20) / (600 × 0.3) = 30,000 / 180 = 167 RPS

Step 3: Model Realistic Patterns

Don't use constant load. Use realistic traffic patterns:

// Realistic daily pattern
export let options = {
  stages: [
    // Morning ramp
    { duration: '2h', target: 500 },    // 08:00-10:00
    { duration: '2h', target: 1000 },   // 10:00-12:00 (peak)
    // Lunch dip
    { duration: '1h', target: 600 },    // 12:00-13:00
    // Afternoon peak
    { duration: '2h', target: 1200 },   // 13:00-15:00 (peak)
    { duration: '2h', target: 800 },    // 15:00-17:00
    // Evening drop
    { duration: '2h', target: 300 },    // 17:00-19:00
  ]
}

Anti-Patterns Catalog

❌ Coordinated Omission

Symptom: Fixed rate load generation ignores slow responses, underestimating latency

Why bad: Hides real latency impact when system slows down

Fix: Use arrival rate (requests/sec) not iteration rate

// ❌ Bad - coordinated omission
export default function() {
  http.get('https://api.example.com')
  sleep(1)  // Wait 1s between requests
}

// ✅ Good - arrival rate pacing
export let options = {
  scenarios: {
    constant_arrival_rate: {
      executor: 'constant-arrival-rate',
      rate: 100,  // 100 RPS regardless of response time
      timeUnit: '1s',
      duration: '10m',
      preAllocatedVUs: 50,
      maxVUs: 200,
    }
  }
}

❌ Cold Start Testing

Symptom: Running load test immediately after deployment without warm-up

Why bad: JIT compilation, cache warming, connection pooling haven't stabilized

Fix: Warm-up phase before measurement

// ✅ Good - warm-up phase
export let options = {
  stages: [
    { duration: '2m', target: 50 },    // Warm-up (not measured)
    { duration: '10m', target: 100 },  // Actual test
  ]
}

❌ Unrealistic Test Data

Symptom: Using same user ID, same query parameters for all virtual users

Why bad: Caches give unrealistic performance, doesn't test real database load

Fix: Parameterized, realistic data

// ❌ Bad - same data
http.get('https://api.example.com/users/123')

// ✅ Good - parameterized data
import { SharedArray } from 'k6/data'
import papaparse from 'https://jslib.k6.io/papaparse/5.1.1/index.js'

const csvData = new SharedArray('users', function () {
  return papaparse.parse(open('./users.csv'), { header: true }).data
})

export default function() {
  const user = csvData[__VU % csvData.length]
  http.get(`https://api.example.com/users/${user.id}`)
}

❌ Constant Load Pattern

Symptom: Running with constant VUs instead of realistic traffic pattern

Why bad: Real traffic has peaks, valleys, not flat line

Fix: Use realistic daily/hourly patterns

❌ Ignoring Think Time

Symptom: No delays between requests, hammering API as fast as possible

Why bad: Unrealistic user behavior, overestimates load

Fix: Add realistic think time based on user behavior

// ✅ Good - realistic think time
import { sleep } from 'k6'

export default function() {
  http.get('https://api.example.com/products')
  sleep(Math.random() * 3 + 2)  // 2-5 seconds browsing

  http.post('https://api.example.com/cart', {...})
  sleep(Math.random() * 5 + 5)  // 5-10 seconds deciding

  http.post('https://api.example.com/checkout', {...})
}

Result Interpretation Guide

Latency Degradation Patterns

Pattern	Cause	What to Check
Linear growth (2x users → 2x latency)	CPU-bound	Thread pool, CPU usage
Exponential growth (2x users → 10x latency)	Resource saturation	Connection pools, locks, queues
Sudden cliff (works until X, then fails)	Hard limit hit	Max connections, memory, file descriptors
Gradual degradation (slow increase over time)	Memory leak, cache pollution	Memory trends, GC activity

Bottleneck Classification

Symptom: p95 latency 10x at 2x load → Resource saturation (database connection pool, thread pool, queue)

Symptom: Errors increase with load → Hard limit (connection limit, rate limiting, timeout)

Symptom: Latency grows over time at constant load → Memory leak or cache pollution

Symptom: High variance (p50 good, p99 terrible) → GC pauses, lock contention, or slow queries

What to Monitor

Layer	Metrics to Track
Application	Request rate, error rate, p50/p95/p99 latency, active requests
Runtime	GC pauses (JVM, .NET), thread pool usage, heap/memory
Database	Connection pool usage, query latency, lock waits, slow queries
Infrastructure	CPU %, memory %, disk I/O, network throughput
External	Third-party API latency, rate limit hits

Capacity Planning Formula

Safe Capacity = (Breaking Point × Degradation Factor) × Safety Margin

Breaking Point = VUs where p95 latency > threshold
Degradation Factor = 0.7 (start degradation before break)
Safety Margin = 0.5-0.7 (handle traffic spikes)

Example:
- System breaks at 1000 VUs (p95 > 1s)
- Start seeing degradation at 700 VUs (70%)
- Safe capacity: 700 × 0.7 = 490 VUs

Authentication and Session Management

Problem: Real APIs require authentication. Can't use same token for all virtual users.

Token Strategy Decision Framework

Scenario	Strategy	Why
Short test (<10 min)	Pre-generate tokens	Fast, simple, no login load
Long test (soak)	Login during test + refresh	Realistic, tests auth system
Testing auth system	Simulate login flow	Auth is part of load
Read-only testing	Shared token (single user)	Simplest, adequate for API-only tests

Default: Pre-generate tokens for load tests, simulate login for auth system tests

Pre-Generated Tokens Pattern

Best for: API testing where auth system isn't being tested

// k6 with pre-generated JWT tokens
import http from 'k6/http'
import { SharedArray } from 'k6/data'

// Load tokens from file (generated externally)
const tokens = new SharedArray('auth tokens', function () {
  return JSON.parse(open('./tokens.json'))
})

export default function() {
  const token = tokens[__VU % tokens.length]

  const headers = {
    'Authorization': `Bearer ${token}`
  }

  http.get('https://api.example.com/protected', { headers })
}

Generate tokens externally:

# Script to generate 1000 tokens
for i in {1..1000}; do
  curl -X POST https://api.example.com/login \
    -d "username=loadtest_user_$i&password=test" \
    | jq -r '.token'
done > tokens.json

Pros: No login load, fast test setup Cons: Tokens may expire during long tests, not testing auth flow

Login Flow Simulation Pattern

Best for: Testing auth system, soak tests where tokens expire

// k6 with login simulation
import http from 'k6/http'
import { SharedArray } from 'k6/data'

const users = new SharedArray('users', function () {
  return JSON.parse(open('./users.json'))  // [{username, password}, ...]
})

export default function() {
  const user = users[__VU % users.length]

  // Login to get token
  const loginRes = http.post('https://api.example.com/login', {
    username: user.username,
    password: user.password
  })

  const token = loginRes.json('token')

  // Use token for subsequent requests
  const headers = { 'Authorization': `Bearer ${token}` }

  http.get('https://api.example.com/protected', { headers })
  http.post('https://api.example.com/data', {}, { headers })
}

Token refresh for long tests:

// k6 with token refresh
import { sleep } from 'k6'

let token = null
let tokenExpiry = 0

export default function() {
  const now = Date.now() / 1000

  // Refresh token if expired or about to expire
  if (!token || now > tokenExpiry - 300) {  // Refresh 5 min before expiry
    const loginRes = http.post('https://api.example.com/login', {...})
    token = loginRes.json('token')
    tokenExpiry = loginRes.json('expires_at')
  }

  http.get('https://api.example.com/protected', {
    headers: { 'Authorization': `Bearer ${token}` }
  })

  sleep(1)
}

Session Cookie Management

For cookie-based auth:

// k6 with session cookies
import http from 'k6/http'

export default function() {
  // k6 automatically handles cookies with jar
  const jar = http.cookieJar()

  // Login (sets session cookie)
  http.post('https://example.com/login', {
    username: 'user',
    password: 'pass'
  })

  // Subsequent requests use session cookie automatically
  http.get('https://example.com/dashboard')
  http.get('https://example.com/profile')
}

Rate Limiting Detection

Pattern: Detect when hitting rate limits during load test

// k6 rate limit detection
import { check } from 'k6'

export default function() {
  const res = http.get('https://api.example.com/data')

  check(res, {
    'not rate limited': (r) => r.status !== 429
  })

  if (res.status === 429) {
    console.warn(`Rate limited at VU ${__VU}, iteration ${__ITER}`)
    const retryAfter = res.headers['Retry-After']
    console.warn(`Retry-After: ${retryAfter} seconds`)
  }
}

Thresholds for rate limiting:

export let options = {
  thresholds: {
    'http_req_failed{status:429}': ['rate<0.01']  // <1% rate limited
  }
}

Third-Party Dependency Handling

Problem: APIs call external services (payment, email, third-party APIs). Should you mock them?

Mock vs Real Decision Framework

External Service	Mock or Real?	Why
Payment gateway	Real (sandbox)	Need to test integration, has sandbox mode
Email provider	Mock	Cost ($0.001/email × 1000 VUs = expensive), no value testing
Third-party API (has staging)	Real (staging)	Test integration, realistic latency
Third-party API (no staging)	Mock	Can't load test production, rate limits
Internal microservices	Real	Testing real integration points
Analytics/tracking	Mock	High volume, no functional impact

Rule: Use real services if they have sandbox/staging. Mock if expensive, rate-limited, or no test environment.

Service Virtualization with WireMock

Best for: Mocking HTTP APIs with realistic responses

// k6 test pointing to WireMock
export default function() {
  // WireMock running on localhost:8080 mocks external API
  const res = http.get('http://localhost:8080/api/payment/process')

  check(res, {
    'payment mock responds': (r) => r.status === 200
  })
}

WireMock stub setup:

{
  "request": {
    "method": "POST",
    "url": "/api/payment/process"
  },
  "response": {
    "status": 200,
    "jsonBody": {
      "transaction_id": "{{randomValue type='UUID'}}",
      "status": "approved"
    },
    "headers": {
      "Content-Type": "application/json"
    },
    "fixedDelayMilliseconds": 200
  }
}

Why WireMock: Realistic latency simulation, dynamic responses, stateful mocking

Partial Mocking Pattern

Pattern: Mock some services, use real for others

// k6 with partial mocking
import http from 'k6/http'

export default function() {
  // Real API (points to staging)
  const productRes = http.get('https://staging-api.example.com/products')

  // Mock email service (points to WireMock)
  http.post('http://localhost:8080/mock/email/send', {
    to: 'user@example.com',
    subject: 'Order confirmation'
  })

  // Real payment sandbox
  http.post('https://sandbox-payment.stripe.com/charge', {
    amount: 1000,
    currency: 'usd',
    source: 'tok_visa'
  })
}

Decision criteria:

Real: Services with sandbox, need integration validation, low cost
Mock: No sandbox, expensive, rate-limited, testing failure scenarios

Testing External Service Failures

Use mocks to simulate failures:

// WireMock stub for failure scenarios
{
  "request": {
    "method": "POST",
    "url": "/api/payment/process"
  },
  "response": {
    "status": 503,
    "jsonBody": {
      "error": "Service temporarily unavailable"
    },
    "fixedDelayMilliseconds": 5000  // Slow failure
  }
}

k6 test for resilience:

export default function() {
  const res = http.post('http://localhost:8080/api/payment/process', {})

  // Verify app handles payment failures gracefully
  check(res, {
    'handles payment failure': (r) => r.status === 503,
    'returns within timeout': (r) => r.timings.duration < 6000
  })
}

Cost and Compliance Guardrails

Before testing with real external services:

Check	Why
Sandbox mode exists?	Avoid production costs/rate limits
Cost per request?	1000 VUs × 10 req/s × 600s = 6M requests
Rate limits?	Will you hit external service limits?
Terms of service?	Does load testing violate TOS?
Data privacy?	Using real user emails/PII?

Example cost calculation:

Email service: $0.001/email
Load test: 100 VUs × 5 emails/session × 600s = 300,000 emails
Cost: 300,000 × $0.001 = $300

Decision: Mock email service, use real payment sandbox (free)

Compliance:

Don't use real user data in load tests (GDPR, privacy)
Check third-party TOS (some prohibit load testing)
Use synthetic test data only

Your First Load Test

Goal: Basic load test in one day

Hour 1-2: Install tool and write smoke test

# Install k6
brew install k6  # macOS
# or snap install k6  # Linux

# Create test.js
cat > test.js <<'EOF'
import http from 'k6/http'
import { check, sleep } from 'k6'

export let options = {
  vus: 1,
  duration: '30s'
}

export default function() {
  let res = http.get('https://your-api.com/health')
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response < 500ms': (r) => r.timings.duration < 500
  })
  sleep(1)
}
EOF

# Run smoke test
k6 run test.js

Hour 3-4: Calculate target load

Your DAU: 10,000
Concurrency: 10%
Peak multiplier: 1.5
Target: 10,000 × 0.10 × 1.5 = 1,500 VUs

Hour 5-6: Write load test with ramp-up

export let options = {
  stages: [
    { duration: '5m', target: 750 },   // Ramp to normal (50%)
    { duration: '10m', target: 750 },  // Hold normal
    { duration: '5m', target: 1500 },  // Ramp to peak
    { duration: '10m', target: 1500 }, // Hold peak
    { duration: '5m', target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1000'],
    http_req_failed: ['rate<0.05']  // < 5% errors
  }
}

Hour 7-8: Run test and analyze

# Run load test
k6 run --out json=results.json test.js

# Check summary output for:
# - p95/p99 latency trends
# - Error rates
# - When degradation started

If test fails: Check thresholds, adjust targets, investigate bottlenecks

Common Mistakes

❌ Testing Production Without Safeguards

Fix: Use feature flags, test environment, or controlled percentage

❌ No Baseline Performance Metrics

Fix: Run smoke test first to establish baseline before load testing

❌ Using Iteration Duration Instead of Arrival Rate

Fix: Use constant-arrival-rate executor in k6

❌ Not Warming Up Caches/JIT

Fix: 2-5 minute warm-up phase before measurement

Quick Reference

Tool Selection:

Modern API: k6
Enterprise: JMeter
Python team: Locust

Test Patterns:

Smoke: 1 VU, 1 min
Load: Ramp-up → peak → ramp-down
Stress: Increase until break
Spike: Sudden 10x surge
Soak: 4-8 hours constant

Load Calculation:

Concurrent = DAU × 0.10 × 1.5
RPS = (Concurrent × Requests/Session) / (Duration × Think Time)

Anti-Patterns:

Coordinated omission (use arrival rate)
Cold start (warm-up first)
Unrealistic data (parameterize)
Constant load (use realistic patterns)

Result Interpretation:

Linear growth → CPU-bound
Exponential growth → Resource saturation
Sudden cliff → Hard limit
Gradual degradation → Memory leak

Authentication:

Short tests: Pre-generate tokens
Long tests: Login + refresh
Testing auth: Simulate login flow

Third-Party Dependencies:

Has sandbox: Use real (staging/sandbox)
Expensive/rate-limited: Mock (WireMock)
No sandbox: Mock

Bottom Line

Start with smoke test (1 VU). Calculate realistic load from DAU. Use ramp-up pattern (never start at peak). Monitor p95/p99 latency. Find breaking point before users do.

Test realistic scenarios with think time, not hammer tests.

Install Skill

SKILL.md