Claude Code Plugins

Community-maintained marketplace

Feedback

Testing quality assessment for test suite health, coverage analysis, and test effectiveness. Use when auditing test quality, improving test coverage, or assessing test reliability and maintainability.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name testing-quality
version 2.1.0
description Testing quality assessment for test suite health, coverage analysis, and test effectiveness. Use when auditing test quality, improving test coverage, or assessing test reliability and maintainability.

Testing Quality

Assessment and improvement of test suite quality, coverage, and effectiveness.

Phase 0: Expertise Loading

expertise_check:
  domain: testing-quality
  file: .claude/expertise/testing-quality.yaml

  if_exists:
    - Load quality metrics
    - Load coverage thresholds
    - Apply assessment criteria

  if_not_exists:
    - Flag discovery mode
    - Document patterns learned

When to Use This Skill

Use testing-quality when:

  • Auditing test suite health
  • Analyzing test coverage gaps
  • Assessing test reliability
  • Improving test maintainability
  • Identifying flaky tests

Quality Dimensions

Dimension Metrics
Coverage Line, branch, function
Reliability Flake rate, consistency
Speed Execution time, parallelization
Maintainability Complexity, duplication

Quality Metrics

Coverage Analysis

metrics:
  line_coverage: ">= 80%"
  branch_coverage: ">= 75%"
  function_coverage: ">= 80%"
  critical_path_coverage: "100%"

Test Health

metrics:
  flaky_test_rate: "< 1%"
  test_execution_time: "< 5 min"
  test_to_code_ratio: ">= 1:1"
  assertion_density: ">= 2 per test"

Anti-Patterns

anti_patterns:
  - Flaky tests (non-deterministic)
  - Test interdependence
  - Over-mocking
  - Missing assertions
  - Slow tests in CI
  - Commented-out tests

MCP Requirements

  • claude-flow: For orchestration
  • Bash: For coverage tools

Recursive Improvement Integration (v2.1)

Eval Harness Integration

benchmark: testing-quality-benchmark-v1
  tests:
    - tq-001: Coverage analysis accuracy
    - tq-002: Anti-pattern detection
  minimum_scores:
    analysis_accuracy: 0.90
    detection_rate: 0.85

Memory Namespace

namespaces:
  - testing-quality/audits/{id}: Quality audits
  - testing-quality/metrics: Health metrics
  - improvement/audits/testing-quality: Skill audits

Uncertainty Handling

confidence_check:
  if confidence >= 0.8:
    - Proceed with assessment
  if confidence 0.5-0.8:
    - Confirm scope
  if confidence < 0.5:
    - Ask for test suite details

Cross-Skill Coordination

Works with: testing, code-review-assistant, functionality-audit


!! SKILL COMPLETION VERIFICATION (MANDATORY) !!

  • Agent Spawning: Spawned agent via Task()
  • Agent Registry Validation: Agent from registry
  • TodoWrite Called: Called with 5+ todos
  • Work Delegation: Delegated to agents

Remember: Skill() -> Task() -> TodoWrite() - ALWAYS

Core Principles

Testing Quality operates on 3 fundamental principles:

Principle 1: Comprehensive Coverage Across Multiple Dimensions

Test quality is not just about line coverage percentage but about effectiveness across coverage, reliability, speed, and maintainability dimensions.

In practice:

  • Line coverage measures which code is executed, but branch coverage ensures all decision paths are tested
  • Function coverage validates all callable units are invoked, while critical path coverage ensures 100% validation of essential workflows
  • Coverage metrics combined with assertion density ensure tests actually validate behavior, not just execute code
  • Test-to-code ratio (>= 1:1) ensures adequate test investment relative to implementation complexity

Principle 2: Test Reliability Through Determinism

Flaky tests that pass or fail non-deterministically erode confidence in the entire test suite and mask real failures.

In practice:

  • Flaky test rate must be <1% with any non-deterministic tests immediately investigated and fixed
  • Test isolation prevents interdependence where one test's side effects affect another's results
  • Reproducible test environments (seeded random data, controlled timing) eliminate environmental variability
  • Test execution monitoring tracks consistency over multiple runs to identify emerging flakiness

Principle 3: Fast Feedback Through Optimized Execution

Test suites that take too long to run get skipped, defeating their purpose of catching bugs early.

In practice:

  • Test execution time target <5 minutes enables running full suite before every commit
  • Parallelization distributes tests across multiple cores to reduce wall-clock time
  • Test prioritization runs fast unit tests before slow integration tests for quick failure detection
  • Slow test identification and optimization ensures test suite speed improves over time rather than degrading

Common Anti-Patterns

Anti-Pattern Problem Solution
Coverage Theater High coverage percentage from tests that execute code but don't validate behavior (missing assertions) Measure assertion density (>= 2 per test); review tests for meaningful validations; require tests to fail when code is broken
Flaky Test Tolerance Accepting non-deterministic tests and re-running until they pass rather than fixing root causes Enforce <1% flaky rate; investigate any flakiness immediately; isolate tests to eliminate interdependence
Over-Mocking Tests that mock everything and validate only that mocks are called, not actual behavior Use mocks sparingly for external dependencies only; test real integrations in sandboxes; validate outputs not just call counts
Slow Test Accumulation Test suite execution time growing unbounded as tests are added without optimization Set execution time budgets (<5 min); parallelize tests; identify and optimize slow tests; separate unit from integration
Test Interdependence Tests that must run in specific order or share state, making them fragile and hard to debug Ensure tests can run independently in any order; use test fixtures for setup; clean up state after each test
Commented-Out Tests Disabling failing tests rather than fixing them or the code they validate Never allow commented tests in codebase; either fix test/code or delete test entirely; track test deletions with justification

Conclusion

Testing Quality assessment provides systematic evaluation of test suite health beyond simple coverage metrics, examining reliability, speed, maintainability, and effectiveness. By measuring test quality across multiple dimensions and identifying anti-patterns like flakiness, over-mocking, and slow execution, this skill helps teams build test suites that actually catch bugs and provide confidence in code correctness rather than creating false security through coverage theater.

Use this skill when auditing test suite health as part of quality initiatives, when analyzing why tests are being skipped or ignored due to flakiness or slow execution, when test coverage numbers are high but bugs still reach production, or when improving test maintainability to reduce technical debt. The skill is essential for establishing trust in automated testing and ensuring that test investments deliver real value through early bug detection rather than just checking a coverage box.

The multi-dimensional assessment - coverage analysis, reliability metrics, speed optimization, and maintainability evaluation - creates a comprehensive view of test suite quality that guides targeted improvements. The identification of specific anti-patterns (flaky tests, over-mocking, interdependence) with concrete metrics enables prioritized remediation rather than vague calls for "better testing." When combined with functionality-audit for execution verification and verification-quality for quality gates, testing-quality ensures that test suites are not just comprehensive but genuinely effective at preventing defects from reaching production.

Core Principles

  1. Multi-Dimensional Assessment - Test quality cannot be measured by coverage alone. Evaluate across coverage, reliability, speed, and maintainability dimensions for comprehensive quality understanding.

  2. Evidence-Based Metrics - Use quantitative thresholds (80% line coverage, <1% flake rate, <5 min execution) validated through static and dynamic analysis rather than subjective assessment.

  3. Continuous Improvement Loop - Quality assessment feeds the dogfooding system for self-improvement. Patterns detected in test suite health inform better testing practices across the codebase.

Anti-Patterns

Anti-Pattern Why It Fails Better Approach
Coverage Theater - Achieving high coverage percentage without testing meaningful scenarios Coverage number becomes vanity metric, critical paths remain untested, false confidence in quality Focus on critical path coverage (100%), verify assertions test actual behavior, use branch coverage not just line coverage
Ignoring Flaky Tests - Tolerating non-deterministic test failures Developers ignore CI failures, real bugs get missed, trust in automation erodes Track flake rate (<1% threshold), quarantine flaky tests, fix or delete them, never merge with flaky tests
Test Code Neglect - Treating test code as second-class citizen without quality standards Unmaintainable test suite, high duplication, unclear test intent, hard to debug failures Apply same quality standards to test code, refactor test helpers, document test patterns, review test code rigorously

Conclusion

The testing quality skill enables systematic assessment of test suite health across multiple dimensions beyond simple coverage metrics. By evaluating coverage adequacy, test reliability, execution speed, and code maintainability, teams gain comprehensive understanding of their testing effectiveness. The integration with the recursive improvement system ensures quality assessments feed continuous self-improvement through the dogfooding cycle, creating a feedback loop that elevates testing practices organization-wide.

Effective quality assessment requires looking beyond vanity metrics to evidence-based evaluation of what tests actually validate. Avoiding anti-patterns like coverage theater or tolerating flaky tests ensures the test suite provides genuine confidence rather than false assurance. By maintaining quality standards for test code itself and continuously improving through the assessment-feedback-improvement loop, teams build test suites that catch bugs reliably, execute quickly, and remain maintainable as codebases evolve. This systematic approach transforms testing from a compliance exercise into a strategic quality advantage.