| name | performance-testing |
| description | Profiles application performance under load using k6, Artillery, or JMeter to measure latency, throughput, and error rates. Use when planning load tests, stress tests, soak tests, benchmarking APIs, or identifying performance bottlenecks. |
| category | specialized-testing |
| priority | high |
| tokenEstimate | 1100 |
| agents | qe-performance-tester, qe-quality-analyzer, qe-production-intelligence |
| implementation_status | optimized |
| optimization_version | 1 |
| last_optimized | Tue Dec 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) |
| dependencies | |
| quick_reference_card | true |
| tags | performance, load-testing, stress-testing, scalability, k6, bottlenecks |
| trust_tier | 3 |
| validation | [object Object] |
Performance Testing
Quick Test Type Selection:
- Expected load validation → Load testing
- Find breaking point → Stress testing
- Sudden traffic spike → Spike testing
- Memory leaks, resource exhaustion → Endurance/soak testing
- Horizontal/vertical scaling → Scalability testing
Critical Success Factors:
- Performance is a feature, not an afterthought
- Test early and often, not just before release
- Focus on user-impacting bottlenecks
Quick Reference Card
When to Use
- Before major releases
- After infrastructure changes
- Before scaling events (Black Friday)
- When setting SLAs/SLOs
Test Types
| Type | Purpose | When |
|---|---|---|
| Load | Expected traffic | Every release |
| Stress | Beyond capacity | Quarterly |
| Spike | Sudden surge | Before events |
| Endurance | Memory leaks | After code changes |
| Scalability | Scaling validation | Infrastructure changes |
Key Metrics
| Metric | Target | Why |
|---|---|---|
| p95 response | < 200ms | User experience |
| Throughput | 10k req/min | Capacity |
| Error rate | < 0.1% | Reliability |
| CPU | < 70% | Headroom |
| Memory | < 80% | Stability |
Tools
- k6: Modern, JS-based, CI/CD friendly
- JMeter: Enterprise, feature-rich
- Artillery: Simple YAML configs
- Gatling: Scala, great reporting
Agent Coordination
qe-performance-tester: Load test orchestrationqe-quality-analyzer: Results analysisqe-production-intelligence: Production comparison
Defining SLOs
Bad: "The system should be fast" Good: "p95 response time < 200ms under 1,000 concurrent users"
export const options = {
thresholds: {
http_req_duration: ['p(95)<200'], // 95% < 200ms
http_req_failed: ['rate<0.01'], // < 1% failures
},
};
Realistic Scenarios
Bad: Every user hits homepage repeatedly Good: Model actual user behavior
// Realistic distribution
// 40% browse, 30% search, 20% details, 10% checkout
export default function () {
const action = Math.random();
if (action < 0.4) browse();
else if (action < 0.7) search();
else if (action < 0.9) viewProduct();
else checkout();
sleep(randomInt(1, 5)); // Think time
}
Common Bottlenecks
Database
Symptoms: Slow queries under load, connection pool exhaustion Fixes: Add indexes, optimize N+1 queries, increase pool size, read replicas
N+1 Queries
// BAD: 100 orders = 101 queries
const orders = await Order.findAll();
for (const order of orders) {
const customer = await Customer.findById(order.customerId);
}
// GOOD: 1 query
const orders = await Order.findAll({ include: [Customer] });
Synchronous Processing
Problem: Blocking operations in request path (sending email during checkout) Fix: Use message queues, process async, return immediately
Memory Leaks
Detection: Endurance testing, memory profiling Common causes: Event listeners not cleaned, caches without eviction
External Dependencies
Solutions: Aggressive timeouts, circuit breakers, caching, graceful degradation
k6 CI/CD Example
// performance-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 }, // Ramp up
{ duration: '3m', target: 50 }, // Steady
{ duration: '1m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<200'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.get('https://api.example.com/products');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 200ms': (r) => r.timings.duration < 200,
});
sleep(1);
}
# GitHub Actions
- name: Run k6 test
uses: grafana/k6-action@v0.3.0
with:
filename: performance-test.js
Analyzing Results
Good Results
Load: 1,000 users | p95: 180ms | Throughput: 5,000 req/s
Error rate: 0.05% | CPU: 65% | Memory: 70%
Problems
Load: 1,000 users | p95: 3,500ms ❌ | Throughput: 500 req/s ❌
Error rate: 5% ❌ | CPU: 95% ❌ | Memory: 90% ❌
Root Cause Analysis
- Correlate metrics: When response time spikes, what changes?
- Check logs: Errors, warnings, slow queries
- Profile code: Where is time spent?
- Monitor resources: CPU, memory, disk
- Trace requests: End-to-end flow
Anti-Patterns
| ❌ Anti-Pattern | ✅ Better |
|---|---|
| Testing too late | Test early and often |
| Unrealistic scenarios | Model real user behavior |
| 0 to 1000 users instantly | Ramp up gradually |
| No monitoring during tests | Monitor everything |
| No baseline | Establish and track trends |
| One-time testing | Continuous performance testing |
Agent-Assisted Performance Testing
// Comprehensive load test
await Task("Load Test", {
target: 'https://api.example.com',
scenarios: {
checkout: { vus: 100, duration: '5m' },
search: { vus: 200, duration: '5m' },
browse: { vus: 500, duration: '5m' }
},
thresholds: {
'http_req_duration': ['p(95)<200'],
'http_req_failed': ['rate<0.01']
}
}, "qe-performance-tester");
// Bottleneck analysis
await Task("Analyze Bottlenecks", {
testResults: perfTest,
metrics: ['cpu', 'memory', 'db_queries', 'network']
}, "qe-performance-tester");
// CI integration
await Task("CI Performance Gate", {
mode: 'smoke',
duration: '1m',
vus: 10,
failOn: { 'p95_response_time': 300, 'error_rate': 0.01 }
}, "qe-performance-tester");
Agent Coordination Hints
Memory Namespace
aqe/performance/
├── results/* - Test execution results
├── baselines/* - Performance baselines
├── bottlenecks/* - Identified bottlenecks
└── trends/* - Historical trends
Fleet Coordination
const perfFleet = await FleetManager.coordinate({
strategy: 'performance-testing',
agents: [
'qe-performance-tester',
'qe-quality-analyzer',
'qe-production-intelligence',
'qe-deployment-readiness'
],
topology: 'sequential'
});
Pre-Production Checklist
- Load test passed (expected traffic)
- Stress test passed (2-3x expected)
- Spike test passed (sudden surge)
- Endurance test passed (24+ hours)
- Database indexes in place
- Caching configured
- Monitoring and alerting set up
- Performance baseline established
Related Skills
- agentic-quality-engineering - Agent coordination
- api-testing-patterns - API performance
- chaos-engineering-resilience - Resilience testing
Remember
Performance is a feature: Test it like functionality Test continuously: Not just before launch Monitor production: Synthetic + real user monitoring Fix what matters: Focus on user-impacting bottlenecks Trend over time: Catch degradation early
With Agents: Agents automate load testing, analyze bottlenecks, and compare with production. Use agents to maintain performance at scale.
Run History
After each performance test run, append results to run-history.json in this skill directory:
node -e "
const fs = require('fs');
const h = JSON.parse(fs.readFileSync('.claude/skills/performance-testing/run-history.json'));
h.runs.push({date: new Date().toISOString().split('T')[0], scenario: 'load', p95_ms: P95, throughput_rps: RPS, error_rate_pct: ERR});
fs.writeFileSync('.claude/skills/performance-testing/run-history.json', JSON.stringify(h, null, 2));
"
Read run-history.json before each run — compare with baselines. Alert if p95 increases >20% from baseline.
Gotchas
- k6 scripts generated by agent often hardcode base URLs — use environment variables for portability
- Load tests in containers hit resource limits before app limits — ensure container has 2x the resources of target
- Agent forgets to include think time between requests — without it, load is unrealistically bursty
- P95 vs P99 matters — agent defaults to averages which hide tail latency problems
- Baseline comparison requires consistent environment — CI runner variance can cause 20%+ noise