| name | testing-quality |
| version | 2.1.0 |
| description | Testing quality assessment for test suite health, coverage analysis, and test effectiveness. Use when auditing test quality, improving test coverage, or assessing test reliability and maintainability. |
Testing Quality
Assessment and improvement of test suite quality, coverage, and effectiveness.
Phase 0: Expertise Loading
expertise_check:
domain: testing-quality
file: .claude/expertise/testing-quality.yaml
if_exists:
- Load quality metrics
- Load coverage thresholds
- Apply assessment criteria
if_not_exists:
- Flag discovery mode
- Document patterns learned
When to Use This Skill
Use testing-quality when:
- Auditing test suite health
- Analyzing test coverage gaps
- Assessing test reliability
- Improving test maintainability
- Identifying flaky tests
Quality Dimensions
| Dimension | Metrics |
|---|---|
| Coverage | Line, branch, function |
| Reliability | Flake rate, consistency |
| Speed | Execution time, parallelization |
| Maintainability | Complexity, duplication |
Quality Metrics
Coverage Analysis
metrics:
line_coverage: ">= 80%"
branch_coverage: ">= 75%"
function_coverage: ">= 80%"
critical_path_coverage: "100%"
Test Health
metrics:
flaky_test_rate: "< 1%"
test_execution_time: "< 5 min"
test_to_code_ratio: ">= 1:1"
assertion_density: ">= 2 per test"
Anti-Patterns
anti_patterns:
- Flaky tests (non-deterministic)
- Test interdependence
- Over-mocking
- Missing assertions
- Slow tests in CI
- Commented-out tests
MCP Requirements
- claude-flow: For orchestration
- Bash: For coverage tools
Recursive Improvement Integration (v2.1)
Eval Harness Integration
benchmark: testing-quality-benchmark-v1
tests:
- tq-001: Coverage analysis accuracy
- tq-002: Anti-pattern detection
minimum_scores:
analysis_accuracy: 0.90
detection_rate: 0.85
Memory Namespace
namespaces:
- testing-quality/audits/{id}: Quality audits
- testing-quality/metrics: Health metrics
- improvement/audits/testing-quality: Skill audits
Uncertainty Handling
confidence_check:
if confidence >= 0.8:
- Proceed with assessment
if confidence 0.5-0.8:
- Confirm scope
if confidence < 0.5:
- Ask for test suite details
Cross-Skill Coordination
Works with: testing, code-review-assistant, functionality-audit
!! SKILL COMPLETION VERIFICATION (MANDATORY) !!
- Agent Spawning: Spawned agent via Task()
- Agent Registry Validation: Agent from registry
- TodoWrite Called: Called with 5+ todos
- Work Delegation: Delegated to agents
Remember: Skill() -> Task() -> TodoWrite() - ALWAYS
Core Principles
Testing Quality operates on 3 fundamental principles:
Principle 1: Comprehensive Coverage Across Multiple Dimensions
Test quality is not just about line coverage percentage but about effectiveness across coverage, reliability, speed, and maintainability dimensions.
In practice:
- Line coverage measures which code is executed, but branch coverage ensures all decision paths are tested
- Function coverage validates all callable units are invoked, while critical path coverage ensures 100% validation of essential workflows
- Coverage metrics combined with assertion density ensure tests actually validate behavior, not just execute code
- Test-to-code ratio (>= 1:1) ensures adequate test investment relative to implementation complexity
Principle 2: Test Reliability Through Determinism
Flaky tests that pass or fail non-deterministically erode confidence in the entire test suite and mask real failures.
In practice:
- Flaky test rate must be <1% with any non-deterministic tests immediately investigated and fixed
- Test isolation prevents interdependence where one test's side effects affect another's results
- Reproducible test environments (seeded random data, controlled timing) eliminate environmental variability
- Test execution monitoring tracks consistency over multiple runs to identify emerging flakiness
Principle 3: Fast Feedback Through Optimized Execution
Test suites that take too long to run get skipped, defeating their purpose of catching bugs early.
In practice:
- Test execution time target <5 minutes enables running full suite before every commit
- Parallelization distributes tests across multiple cores to reduce wall-clock time
- Test prioritization runs fast unit tests before slow integration tests for quick failure detection
- Slow test identification and optimization ensures test suite speed improves over time rather than degrading
Common Anti-Patterns
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Coverage Theater | High coverage percentage from tests that execute code but don't validate behavior (missing assertions) | Measure assertion density (>= 2 per test); review tests for meaningful validations; require tests to fail when code is broken |
| Flaky Test Tolerance | Accepting non-deterministic tests and re-running until they pass rather than fixing root causes | Enforce <1% flaky rate; investigate any flakiness immediately; isolate tests to eliminate interdependence |
| Over-Mocking | Tests that mock everything and validate only that mocks are called, not actual behavior | Use mocks sparingly for external dependencies only; test real integrations in sandboxes; validate outputs not just call counts |
| Slow Test Accumulation | Test suite execution time growing unbounded as tests are added without optimization | Set execution time budgets (<5 min); parallelize tests; identify and optimize slow tests; separate unit from integration |
| Test Interdependence | Tests that must run in specific order or share state, making them fragile and hard to debug | Ensure tests can run independently in any order; use test fixtures for setup; clean up state after each test |
| Commented-Out Tests | Disabling failing tests rather than fixing them or the code they validate | Never allow commented tests in codebase; either fix test/code or delete test entirely; track test deletions with justification |
Conclusion
Testing Quality assessment provides systematic evaluation of test suite health beyond simple coverage metrics, examining reliability, speed, maintainability, and effectiveness. By measuring test quality across multiple dimensions and identifying anti-patterns like flakiness, over-mocking, and slow execution, this skill helps teams build test suites that actually catch bugs and provide confidence in code correctness rather than creating false security through coverage theater.
Use this skill when auditing test suite health as part of quality initiatives, when analyzing why tests are being skipped or ignored due to flakiness or slow execution, when test coverage numbers are high but bugs still reach production, or when improving test maintainability to reduce technical debt. The skill is essential for establishing trust in automated testing and ensuring that test investments deliver real value through early bug detection rather than just checking a coverage box.
The multi-dimensional assessment - coverage analysis, reliability metrics, speed optimization, and maintainability evaluation - creates a comprehensive view of test suite quality that guides targeted improvements. The identification of specific anti-patterns (flaky tests, over-mocking, interdependence) with concrete metrics enables prioritized remediation rather than vague calls for "better testing." When combined with functionality-audit for execution verification and verification-quality for quality gates, testing-quality ensures that test suites are not just comprehensive but genuinely effective at preventing defects from reaching production.
Core Principles
Multi-Dimensional Assessment - Test quality cannot be measured by coverage alone. Evaluate across coverage, reliability, speed, and maintainability dimensions for comprehensive quality understanding.
Evidence-Based Metrics - Use quantitative thresholds (80% line coverage, <1% flake rate, <5 min execution) validated through static and dynamic analysis rather than subjective assessment.
Continuous Improvement Loop - Quality assessment feeds the dogfooding system for self-improvement. Patterns detected in test suite health inform better testing practices across the codebase.
Anti-Patterns
| Anti-Pattern | Why It Fails | Better Approach |
|---|---|---|
| Coverage Theater - Achieving high coverage percentage without testing meaningful scenarios | Coverage number becomes vanity metric, critical paths remain untested, false confidence in quality | Focus on critical path coverage (100%), verify assertions test actual behavior, use branch coverage not just line coverage |
| Ignoring Flaky Tests - Tolerating non-deterministic test failures | Developers ignore CI failures, real bugs get missed, trust in automation erodes | Track flake rate (<1% threshold), quarantine flaky tests, fix or delete them, never merge with flaky tests |
| Test Code Neglect - Treating test code as second-class citizen without quality standards | Unmaintainable test suite, high duplication, unclear test intent, hard to debug failures | Apply same quality standards to test code, refactor test helpers, document test patterns, review test code rigorously |
Conclusion
The testing quality skill enables systematic assessment of test suite health across multiple dimensions beyond simple coverage metrics. By evaluating coverage adequacy, test reliability, execution speed, and code maintainability, teams gain comprehensive understanding of their testing effectiveness. The integration with the recursive improvement system ensures quality assessments feed continuous self-improvement through the dogfooding cycle, creating a feedback loop that elevates testing practices organization-wide.
Effective quality assessment requires looking beyond vanity metrics to evidence-based evaluation of what tests actually validate. Avoiding anti-patterns like coverage theater or tolerating flaky tests ensures the test suite provides genuine confidence rather than false assurance. By maintaining quality standards for test code itself and continuously improving through the assessment-feedback-improvement loop, teams build test suites that catch bugs reliably, execute quickly, and remain maintainable as codebases evolve. This systematic approach transforms testing from a compliance exercise into a strategic quality advantage.