name	performance-testing
description	Test application performance, scalability, and resilience. Use when planning load testing, stress testing, or optimizing system performance.
version	1.0.0
category	testing
tags	performance, load-testing, stress-testing, scalability, optimization, monitoring
difficulty	intermediate
estimated_time	45 minutes
author	user

Performance Testing

Core Principle

Performance is a feature, not an afterthought.

Test performance like you test functionality: continuously, automatically, and with clear acceptance criteria.

Why Performance Testing Matters

User Impact

100ms delay = 1% drop in conversions (Amazon)
53% of mobile users abandon sites taking > 3 seconds (Google)
Slow = Broken (in users' eyes)

Business Impact

Lost revenue from abandoned transactions
Increased infrastructure costs
Degraded user experience
Reputation damage

Technical Impact

Scalability limits
Infrastructure bottlenecks
Hidden architectural problems

Types of Performance Testing

1. Load Testing

What: System behavior under expected load

Goal: Verify the system handles typical usage

Example:

E-commerce site handling 1,000 concurrent users
API serving 10,000 requests/minute
Database processing 500 transactions/second

When: Before every major release

Tools: k6, JMeter, Gatling, Artillery

2. Stress Testing

What: System behavior under extreme load (beyond capacity)

Goal: Find breaking point, see how system fails

Example:

Ramping up from 1,000 to 10,000 concurrent users
Pushing API until response time degrades
Filling database until queries slow

When: Before scaling infrastructure, quarterly at minimum

What to look for: Graceful degradation, not catastrophic failure

3. Spike Testing

What: System behavior under sudden load increase

Goal: Test auto-scaling, handling unexpected traffic

Example:

Black Friday sale announcement
Viral social media post
Marketing campaign launch

When: Before major events, after infrastructure changes

Pattern: Instant ramp from normal to 5-10x load

4. Endurance/Soak Testing

What: System behavior over extended time

Goal: Find memory leaks, resource exhaustion, gradual degradation

Example:

Run at normal load for 24-72 hours
Monitor memory, connections, file handles
Check for resource leaks

When: After significant code changes, quarterly

What to look for: Stable resource usage over time

5. Scalability Testing

What: How system performs as load increases

Goal: Validate horizontal/vertical scaling

Example:

Add servers, measure throughput improvement
Test auto-scaling triggers
Find scaling limits

When: Before capacity planning, infrastructure changes

Performance Testing Strategy

Start with Requirements

Bad: "The system should be fast" Good: "95th percentile response time < 200ms under 1,000 concurrent users"

Define SLOs (Service Level Objectives):

Response Time: 95th percentile < 200ms
Throughput: 10,000 requests/minute minimum
Error Rate: < 0.1% under load
Resource Usage: CPU < 70%, Memory < 80%

Identify Critical Paths

Don't test everything equally. Focus on:

Revenue-generating flows (checkout, payment)
High-traffic pages (homepage, product pages)
Critical APIs (authentication, data access)
Resource-intensive operations (search, reports)

Realistic Scenarios

Bad: Every user hits homepage repeatedly Good:

40% browse products
30% search
20% view product details
10% checkout

Include:

Think time (users don't click instantly)
Varied data (different products, users, queries)
Realistic workflows (browse → search → add to cart → checkout)

Setting Up Performance Tests

Test Environment

Ideal: Production-like infrastructure

Same server specs
Same database size
Same network topology
Same third-party integrations (or mocks)

Reality: Often scaled-down version

Document differences
Extrapolate results carefully
Validate with production monitoring

Test Data

Requirements:

Realistic volume (don't test with 100 users when you have 10M)
Varied data (avoid cache hits skewing results)
Production-like distribution (80/20 rule applies)

Example:

Products: 100,000 (matching production)
Users: 50,000 test accounts
Orders: 1M historical orders
Search queries: Real query distribution

Monitoring During Tests

Essential metrics:

Response time (avg, 50th, 95th, 99th percentile)
Throughput (requests/second)
Error rate
CPU, memory, disk I/O
Database query time
Network latency

Tools:

Application: New Relic, Datadog, Dynatrace
Infrastructure: Prometheus, Grafana
Database: Query analyzers, slow query logs

Common Performance Bottlenecks

1. Database

Symptoms:

Slow queries under load
Connection pool exhaustion
Lock contention

Solutions:

Add indexes on filtered columns
Optimize N+1 queries
Increase connection pool size
Add read replicas
Implement caching

2. N+1 Queries

Problem:

// Load 100 orders
const orders = await Order.findAll();

// For each order, load customer (100 queries!)
for (const order of orders) {
  const customer = await Customer.findById(order.customerId);
}

Fix:

// Load orders with customers in one query
const orders = await Order.findAll({
  include: [Customer]
});

3. Synchronous Processing

Problem: Blocking operations in request path

Example: Sending email during checkout

Fix:

Use message queues (RabbitMQ, SQS)
Process asynchronously
Return response immediately

4. Memory Leaks

Symptoms:

Memory usage grows over time
Performance degrades gradually
Eventually crashes

Detection:

Endurance testing
Memory profiling (heap dumps)
Monitor garbage collection

Common causes:

Event listeners not cleaned up
Caches without eviction
Circular references
Global state accumulation

5. Inadequate Caching

Problem: Recalculating same results repeatedly

Strategy:

Cache expensive operations
Use CDN for static assets
Implement application-level caching (Redis)
Browser caching (Cache-Control headers)

Balance: Cache hit rate vs. memory usage

6. External Dependencies

Problem: Third-party APIs slow or unavailable

Solutions:

Set aggressive timeouts
Implement circuit breakers
Cache responses when possible
Degrade gracefully if unavailable

Performance Testing in CI/CD

Continuous Performance Testing

Approach 1: Smoke Tests

Run small load test on every commit
10 concurrent users for 1 minute
Catch major regressions quickly

Approach 2: Nightly Tests

Full load test overnight
More comprehensive scenarios
Trend analysis over time

Approach 3: Pre-Production Gate

Load test before production deploy
Automated pass/fail criteria
Block deployment if performance degrades

Example: k6 in CI/CD

// performance-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '1m', target: 50 },   // Ramp up
    { duration: '3m', target: 50 },   // Stay at 50 users
    { duration: '1m', target: 0 },    // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<200'],  // 95% of requests < 200ms
    http_req_failed: ['rate<0.01'],    // < 1% failures
  },
};

export default function () {
  const res = http.get('https://api.example.com/products');
  
  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 200ms': (r) => r.timings.duration < 200,
  });
  
  sleep(1);
}

# .github/workflows/performance.yml
name: Performance Tests

on:
  pull_request:
    branches: [main]

jobs:
  performance:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Run k6 test
        uses: grafana/k6-action@v0.3.0
        with:
          filename: performance-test.js
          
      - name: Upload results
        uses: actions/upload-artifact@v2
        with:
          name: k6-results
          path: results.json

Analyzing Performance Test Results

Key Metrics

Response Time Distribution:

Average: Misleading (outliers skew it)
Median (50th percentile): Typical user experience
95th percentile: "Slow but acceptable"
99th percentile: Worst user experience

Throughput:

Requests/second sustained
How it changes with load
Where it plateaus (capacity)

Error Rate:

Should stay flat as load increases
Spike indicates breaking point

Interpreting Results

Good:

Load: 1,000 users
Response time p95: 180ms
Throughput: 5,000 req/s
Error rate: 0.05%
CPU: 65%, Memory: 70%

Problems:

Load: 1,000 users
Response time p95: 3,500ms ❌ (too slow)
Throughput: 500 req/s ❌ (way below target)
Error rate: 5% ❌ (too many errors)
CPU: 95%, Memory: 90% ❌ (maxed out)

Root Cause Analysis

Correlate metrics: When response time spikes, what else changes?
Check logs: Errors, warnings, slow queries
Profile code: Where is time spent?
Monitor resources: CPU, memory, network, disk
Trace requests: End-to-end request flow

Production Performance Monitoring

Synthetic Monitoring

What: Automated tests hitting production

Example:

Every 5 minutes, test critical flows
Measure response time from multiple locations
Alert if degradation detected

Tools: Pingdom, Datadog Synthetics, New Relic Synthetics

Real User Monitoring (RUM)

What: Measure actual user experience

Metrics:

Page load time
Time to interactive
API response times
Error rates

Tools: Google Analytics, New Relic Browser, Datadog RUM

Alerting

Set alerts on:

p95 response time > threshold
Error rate > 1%
Throughput drops suddenly
Queue depth growing

Don't alert on:

Average response time (too noisy)
Single slow request (outliers happen)

Performance Testing Anti-Patterns

❌ Testing Too Late

Problem: Find performance issues in production

Fix: Test early and often, catch issues before release

❌ Unrealistic Scenarios

Problem: Test doesn't match real usage

Example: All users hitting same endpoint simultaneously

Fix: Model realistic user journeys, think time, data distribution

❌ Ignoring Ramp-Up

Problem: 0 to 1,000 users instantly

Real world: Traffic grows gradually (usually)

Fix: Ramp up over time, see how system adapts

❌ Testing Without Monitoring

Problem: Can't see what's happening during test

Fix: Monitor everything during tests

❌ No Baseline

Problem: Don't know if results are good or bad

Fix: Establish baseline, track trends over time

❌ One-Time Testing

Problem: Test once before launch, never again

Fix: Continuous performance testing, trend monitoring

Tools Overview

Load Testing Tools

k6: Modern, developer-friendly, JavaScript-based

Good for: CI/CD integration, API testing
Learning curve: Low

JMeter: Mature, feature-rich, GUI-based

Good for: Complex scenarios, extensive protocols
Learning curve: Medium

Gatling: Scala-based, great reporting

Good for: High load, detailed metrics
Learning curve: Medium

Artillery: Node.js, simple YAML configs

Good for: Quick tests, simple scenarios
Learning curve: Very low

Locust: Python-based, distributed testing

Good for: Custom user behavior, Python ecosystems
Learning curve: Low-Medium

APM (Application Performance Monitoring)

New Relic: Comprehensive, expensive
Datadog: Infrastructure + APM combined
Dynatrace: AI-powered root cause analysis
AppDynamics: Enterprise-focused

Database Profiling

pg_stat_statements (PostgreSQL)
MySQL slow query log
MongoDB profiler
Redis SLOWLOG

Real-World Example

Scenario: E-Commerce Checkout Slow

Problem: Checkout taking 5+ seconds under load

Investigation:

Load test: Reproduce issue
Monitor: Database CPU at 95%
Profile: Slow query on order creation
Root cause: Missing index on orders.user_id

Fix:

CREATE INDEX idx_orders_user_id ON orders(user_id);

Result:

Checkout time: 5s → 300ms
Database CPU: 95% → 40%
Throughput: 5x improvement

When NOT to Performance Test

MVPs/Prototypes: Focus on validating idea first
Internal tools: With < 10 users, performance rarely matters
One-time scripts: Not worth the effort
Before optimization: Profile first, optimize second, then test

Checklist: Before Going to Production

Load test passed (expected traffic)
Stress test passed (2-3x expected traffic)
Spike test passed (sudden traffic surge)
Endurance test passed (24+ hours)
Database indexes in place
Caching configured
Monitoring and alerting set up
Auto-scaling configured (if applicable)
Performance baseline established

Remember

Performance is a feature: Test it like functionality Test continuously: Not just before launch Monitor production: Synthetic + real user monitoring Set realistic goals: Based on business requirements Fix what matters: Focus on user-impacting bottlenecks Trend over time: Performance degrades gradually, catch it early

Using with QE Agents

Automated Performance Testing

qe-performance-tester orchestrates load testing:

// Agent runs comprehensive load test
const perfTest = await agent.runLoadTest({
  target: 'https://api.example.com',
  scenarios: {
    checkout: { vus: 100, duration: '5m' },
    search: { vus: 200, duration: '5m' },
    browse: { vus: 500, duration: '5m' }
  },
  thresholds: {
    'http_req_duration': ['p(95)<200'],
    'http_req_failed': ['rate<0.01'],
    'http_reqs': ['rate>100']
  }
});

// Returns detailed performance report

Bottleneck Analysis

// Agent identifies performance bottlenecks
const analysis = await qe-performance-tester.analyzeBottlenecks({
  testResults: perfTest,
  metrics: ['cpu', 'memory', 'db_queries', 'network', 'cache_hits']
});

// Returns:
// {
//   bottlenecks: [
//     { component: 'database', severity: 'high', suggestion: 'Add index on orders.created_at' },
//     { component: 'api', severity: 'medium', suggestion: 'Enable response caching' }
//   ]
// }

Continuous Performance Monitoring

// Agent integrates performance testing in CI/CD
const ciPerformance = await qe-performance-tester.ciIntegration({
  mode: 'smoke',  // Quick test on every commit
  duration: '1m',
  vus: 10,
  failOn: {
    'p95_response_time': 300,  // ms
    'error_rate': 0.01         // 1%
  }
});

Fleet Coordination for Performance

const performanceFleet = await FleetManager.coordinate({
  strategy: 'performance-testing',
  agents: [
    'qe-performance-tester',       // Run load tests
    'qe-quality-analyzer',         // Analyze results
    'qe-production-intelligence',  // Compare to production
    'qe-deployment-readiness'      // Go/no-go decision
  ],
  topology: 'sequential'
});

Related Skills

Testing:

agentic-quality-engineering - Agent coordination
api-testing-patterns - API performance testing

Quality:

quality-metrics - Performance metrics tracking
risk-based-testing - Performance risk assessment

Resources

k6 Documentation: k6.io/docs
Google Web Fundamentals: Performance optimization guides
"Release It!" by Michael Nygard: Production-ready patterns
High Performance Browser Networking by Ilya Grigorik

Performance testing isn't optional—it's how you ensure your system works when it matters most.

Install Skill

SKILL.md

Performance Testing

Core Principle

Why Performance Testing Matters

User Impact

Business Impact

Technical Impact

Types of Performance Testing

1. Load Testing

2. Stress Testing

3. Spike Testing

4. Endurance/Soak Testing

5. Scalability Testing

Performance Testing Strategy

Start with Requirements

Identify Critical Paths

Realistic Scenarios

Setting Up Performance Tests

Test Environment

Test Data

Monitoring During Tests

Common Performance Bottlenecks

1. Database

2. N+1 Queries

3. Synchronous Processing

4. Memory Leaks

5. Inadequate Caching

6. External Dependencies

Performance Testing in CI/CD

Continuous Performance Testing

Example: k6 in CI/CD

Analyzing Performance Test Results

Key Metrics

Interpreting Results

Root Cause Analysis

Production Performance Monitoring

Synthetic Monitoring

Real User Monitoring (RUM)

Alerting

Performance Testing Anti-Patterns

❌ Testing Too Late

❌ Unrealistic Scenarios

❌ Ignoring Ramp-Up

❌ Testing Without Monitoring

❌ No Baseline

❌ One-Time Testing

Tools Overview

Load Testing Tools

APM (Application Performance Monitoring)

Database Profiling

Real-World Example

Scenario: E-Commerce Checkout Slow

When NOT to Performance Test

Checklist: Before Going to Production

Remember

Using with QE Agents

Automated Performance Testing

Bottleneck Analysis

Continuous Performance Monitoring

Fleet Coordination for Performance

Related Skills

Resources