| name | quality-metrics |
| description | Measure quality effectively with actionable metrics. Use when establishing quality dashboards, defining KPIs, or evaluating test effectiveness. |
| version | 1.0.0 |
| category | quality-engineering |
| tags | metrics, kpis, quality-dashboards, dora-metrics, measurement, continuous-improvement |
| difficulty | intermediate |
| estimated_time | 30-45 minutes |
| author | user |
Quality Metrics
Core Principle
Measure what matters, not what's easy to measure.
Metrics should drive better decisions, not just prettier dashboards. If a metric doesn't change behavior or inform action, stop tracking it.
The Vanity Metrics Problem
Vanity Metrics (Stop Measuring These)
Test Count
- "We have 5,000 tests!"
- So what? Are they finding bugs? Are they maintainable? Do they give confidence?
Code Coverage Percentage
- "We achieved 85% coverage!"
- Useless without context. 85% of what? Critical paths? Or just getters/setters?
Test Cases Executed
- "Ran 10,000 test cases today!"
- How many found problems? How many are redundant?
Bugs Found
- "QA found 200 bugs this sprint!"
- Is that good or bad? Are they trivial or critical? Should they have been found earlier?
Story Points Completed
- "We completed 50 points of testing work!"
- Points are relative and gameable. What actually got better?
Why Vanity Metrics Fail
- Easily gamed: People optimize for the metric, not the goal
- No context: Numbers without meaning
- No action: What do you do differently based on this number?
- False confidence: High numbers that mean nothing
Meaningful Metrics
1. Defect Escape Rate
What: Percentage of bugs that reach production vs. caught before release
Why it matters: Measures effectiveness of your quality process
How to measure:
Defect Escape Rate = (Production Bugs / Total Bugs Found) × 100
Good: < 5% escape rate Needs work: > 15% escape rate
Actions:
- High escape rate → Shift testing left, improve risk assessment
- Low escape rate but slow releases → Maybe over-testing, reduce friction
2. Mean Time to Detect (MTTD)
What: How long from bug introduction to discovery
Why it matters: Faster detection = cheaper fixes
How to measure:
MTTD = Time bug found - Time bug introduced
Good: < 1 day for critical paths Needs work: > 1 week
Actions:
- High MTTD → Add monitoring, improve test coverage on critical paths
- Very low MTTD → Your fast feedback loops are working
3. Mean Time to Resolution (MTTR)
What: Time from bug discovery to fix deployed
Why it matters: Indicates team efficiency and process friction
How to measure:
MTTR = Time fix deployed - Time bug discovered
Good: < 24 hours for critical bugs, < 1 week for minor Needs work: > 1 week for critical bugs
Actions:
- High MTTR → Investigate bottlenecks (test env access? deployment pipeline? handoffs?)
- Very low MTTR but high escape rate → Rushing fixes, need better verification
4. Deployment Frequency
What: How often you deploy to production
Why it matters: Proxy for team confidence and process maturity
How to measure:
Deployments per week (or day)
Good: Multiple per day Decent: Multiple per week Needs work: Less than weekly
Actions:
- Low frequency → Reduce batch size, improve automation, build confidence
- High frequency with high defect rate → Need better automated checks
5. Change Failure Rate
What: Percentage of deployments that cause production issues
Why it matters: Measures release quality
How to measure:
Change Failure Rate = (Failed Deployments / Total Deployments) × 100
Good: < 5% Needs work: > 15%
Actions:
- High failure rate → Improve pre-production validation, add canary deployments
- Very low but slow releases → Maybe you can deploy more frequently
6. Test Execution Time
What: How long your test suite takes to run
Why it matters: Slow tests = slow feedback = less frequent testing
How to measure:
Time from commit to test completion
Good: < 10 minutes for unit tests, < 30 minutes for full suite Needs work: > 1 hour
Actions:
- Slow tests → Parallelize, remove redundant tests, optimize slow tests
- Fast tests but bugs escaping → Coverage gaps, need better tests
7. Flaky Test Rate
What: Percentage of tests that fail intermittently
Why it matters: Flaky tests destroy confidence
How to measure:
Flaky Test Rate = (Flaky Tests / Total Tests) × 100
Good: < 1% Needs work: > 5%
Actions:
- High flakiness → Fix or delete flaky tests immediately (quarantine pattern)
- Low flakiness → Maintain vigilance, don't let it creep up
Context-Specific Metrics
For Startups
Focus on:
- Deployment frequency (speed to market)
- Critical path coverage (protect revenue)
- MTTR (move fast, fix fast)
Skip:
- Comprehensive coverage metrics
- Detailed test documentation
- Complex traceability
For Regulated Industries
Focus on:
- Traceability (requirement → test → result)
- Test documentation completeness
- Audit trail integrity
Don't skip:
- Deployment frequency still matters
- But compliance isn't optional
For Established Products
Focus on:
- Defect escape rate (protect reputation)
- Regression detection (maintain stability)
- Test maintenance cost
Balance:
- Innovation vs. stability
- New features vs. technical debt
Leading vs. Lagging Indicators
Lagging Indicators (Rearview Mirror)
- Defect escape rate
- Production incidents
- Customer complaints
- MTTR
Use for: Understanding what happened, trending over time
Leading Indicators (Windshield)
- Code review quality
- Test coverage on new code
- Deployment frequency trend
- Team confidence surveys
Use for: Predicting problems, early intervention
Metrics for Different Audiences
For Developers
- Test execution time
- Flaky test rate
- Code review turnaround
- Build failure frequency
Language: Technical, actionable
For Product/Management
- Deployment frequency
- Change failure rate
- Feature lead time
- Customer-impacting incidents
Language: Business outcomes, not technical details
For Executive Leadership
- Defect escape rate trend
- Mean time to resolution
- Release velocity
- Customer satisfaction (related to quality)
Language: Business impact, strategic
Building a Metrics Dashboard
Essential Dashboard (Start Here)
Top Row (Health)
- Defect escape rate (last 30 days)
- Deployment frequency (last 7 days)
- Change failure rate (last 30 days)
Middle Row (Speed)
- MTTD (average, last 30 days)
- MTTR (average, last 30 days)
- Test execution time (current)
Bottom Row (Trends)
- All of the above as sparklines (3-6 months)
Advanced Dashboard (If Needed)
Add:
- Flaky test rate
- Test coverage on critical paths (not overall %)
- Production error rate
- Customer-reported bugs vs. internally found
Anti-Patterns
❌ Metric-Driven Development
Problem: Optimizing for metrics instead of quality
Example: Writing useless tests to hit coverage targets
Fix: Focus on outcomes (can we deploy confidently?) not numbers
❌ Too Many Metrics
Problem: Dashboard overload, no clear priorities
Example: Tracking 30+ metrics that no one understands
Fix: Start with 5-7 core metrics, add only if they drive decisions
❌ Metrics Without Action
Problem: Tracking numbers but not changing behavior
Example: Watching MTTR climb for months without investigating
Fix: For every metric, define thresholds and actions
❌ Gaming the System
Problem: People optimize for metrics, not quality
Example: Marking bugs as "won't fix" to improve resolution time
Fix: Multiple complementary metrics, qualitative reviews
❌ One-Size-Fits-All
Problem: Using same metrics for all teams/contexts
Example: Measuring startup team same as regulated medical device team
Fix: Context-driven metric selection
Metric Hygiene
Review Quarterly
- Are we still using this metric to make decisions?
- Is it being gamed?
- Does it reflect current priorities?
Adjust Thresholds
- What's "good" changes as you improve
- Don't keep celebrating the same baseline
- Raise the bar when appropriate
Kill Zombie Metrics
- If no one looks at it → Delete it
- If no one can explain what action to take → Delete it
- If it's always green or always red → Delete it
Real-World Examples
Example 1: E-Commerce Company
Before:
- Measured: Test count (5,000 tests)
- Result: Slow CI, frequent production bugs
After:
- Measured: Defect escape rate (8%), MTTD (3 days), deployment frequency (2/week)
- Actions:
- Removed 2,000 redundant tests
- Added monitoring for critical paths
- Improved deployment pipeline
- Result: Escape rate to 3%, MTTD to 6 hours, deploy 5x/day
Example 2: SaaS Platform
Before:
- Measured: Code coverage (85%)
- Result: False confidence, bugs in uncovered critical paths
After:
- Measured: Critical path coverage (60%), deployment frequency, change failure rate
- Actions:
- Focused testing on payment, auth, data integrity
- Removed tests on deprecated features
- Added production monitoring
- Result: Fewer production incidents, faster releases
Questions to Ask About Any Metric
What decision does this inform?
- If none → Don't track it
What action do we take if it's red?
- If you don't know → Define thresholds and actions
Can this be gamed?
- If yes → Add complementary metrics
Does this reflect actual quality?
- If no → Replace it with something that does
Who needs to see this?
- If no one → Stop tracking it
Remember
Good metrics:
- Drive better decisions
- Are actionable
- Reflect actual outcomes
- Change as you mature
Bad metrics:
- Make dashboards pretty
- Are easily gamed
- Provide false confidence
- Persist long after they're useful
Start small: 5-7 metrics that matter Review often: Quarterly at minimum Kill ruthlessly: Remove metrics that don't drive action Stay contextual: What matters changes with your situation
Using with QE Agents
Automated Metrics Collection
qe-quality-analyzer collects and analyzes quality metrics:
// Agent collects comprehensive metrics automatically
await agent.collectMetrics({
scope: 'all',
timeframe: '30d',
categories: [
'deployment-frequency',
'defect-escape-rate',
'test-execution-time',
'flaky-test-rate',
'coverage-trends'
]
});
// Returns real-time dashboard data
// No manual tracking required
Intelligent Metric Analysis
qe-quality-analyzer identifies trends and anomalies:
// Agent detects metric anomalies
const analysis = await agent.analyzeTrends({
metric: 'defect-escape-rate',
timeframe: '90d',
alertThreshold: 0.15
});
// Returns:
// {
// trend: 'increasing',
// currentValue: 0.18,
// avgValue: 0.08,
// anomaly: true,
// recommendation: 'Increase pre-release testing focus',
// relatedMetrics: ['test-coverage: decreasing', 'MTTR: increasing']
// }
Actionable Insights from Metrics
qe-quality-gate uses metrics for decision-making:
// Agent makes GO/NO-GO decisions based on metrics
const decision = await agent.evaluateMetrics({
release: 'v3.2',
thresholds: {
defectEscapeRate: '<5%',
changeFailureRate: '<10%',
testExecutionTime: '<15min',
flakyTestRate: '<2%'
}
});
// Returns:
// {
// decision: 'NO-GO',
// blockers: [
// 'Flaky test rate: 4.2% (threshold: 2%)'
// ],
// recommendations: [
// 'Run qe-flaky-test-hunter to stabilize tests'
// ]
// }
Real-Time Metrics Dashboard
qe-quality-analyzer generates live dashboards:
// Agent creates context-specific dashboards
await agent.createDashboard({
audience: 'executive', // or 'developer', 'product'
focus: 'release-readiness',
updateFrequency: 'real-time'
});
// Executive Dashboard:
// - Defect escape rate: 3.2% ✅
// - Deployment frequency: 5/day ✅
// - Change failure rate: 7% ✅
// - Customer-impacting incidents: 1 (down from 3)
Metric-Driven Test Optimization
qe-regression-risk-analyzer uses metrics to optimize testing:
// Agent identifies which tests provide most value
const optimization = await agent.optimizeTestSuite({
metrics: {
executionTime: 'per-test',
defectDetectionRate: 'per-test',
maintenanceCost: 'per-test'
},
goal: 'maximize-value-per-minute'
});
// Recommends:
// - Remove 50 tests with 0% defect detection (save 15 min)
// - Keep top 200 tests (95% defect detection)
// - Result: 40% faster suite, 5% defect detection loss
Fleet Coordination for Metrics
// Multiple agents collaborate on metrics collection and analysis
const metricsFleet = await FleetManager.coordinate({
strategy: 'quality-metrics',
agents: [
'qe-test-executor', // Collect execution metrics
'qe-coverage-analyzer', // Collect coverage metrics
'qe-production-intelligence', // Collect production metrics
'qe-quality-analyzer', // Analyze and visualize
'qe-quality-gate' // Make decisions
],
topology: 'hierarchical'
});
// Continuous metrics pipeline
await metricsFleet.execute({
schedule: 'continuous',
aggregationInterval: '5min'
});
Context-Aware Metric Selection
// Agent recommends metrics based on context
const recommendation = await qe-quality-analyzer.recommendMetrics({
context: 'startup',
stage: 'early',
team: 'small',
compliance: 'none'
});
// Recommends:
// - deployment-frequency (speed to market)
// - critical-path-coverage (protect revenue)
// - MTTR (move fast, fix fast)
//
// Skip:
// - comprehensive coverage %
// - detailed traceability
// - process compliance metrics
Related Skills
Core Quality Practices:
- agentic-quality-engineering - Metrics-driven agent coordination
- holistic-testing-pact - Metrics across test quadrants
Testing Approaches:
- risk-based-testing - Risk-based metric selection
- test-automation-strategy - Automation effectiveness metrics
- exploratory-testing-advanced - Exploratory session metrics
Development Practices:
- xp-practices - XP success metrics (velocity, lead time)
Resources
- Accelerate by Forsgren, Humble, Kim (DORA metrics)
- How to Measure Anything by Douglas Hubbard (measuring intangibles)
- Your own retrospectives (which metrics helped? Which didn't?)
Metrics are tools for better decisions, not scorecards for performance reviews. Use them wisely.
With Agents: Agents automate metrics collection, detect trends and anomalies, and provide context-aware recommendations. Use agents to make metrics actionable and avoid vanity metrics. Agents continuously analyze what drives quality outcomes in your specific context.