Claude Code Plugins

Community-maintained marketplace

Feedback

mutation-testing

@tachyon-beep/skillpacks
4
0

Use when validating test effectiveness, measuring test quality beyond coverage, choosing mutation testing tools (Stryker, PITest, mutmut), interpreting mutation scores, or improving test suites - provides mutation operators, score interpretation, and integration patterns

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name mutation-testing
description Use when validating test effectiveness, measuring test quality beyond coverage, choosing mutation testing tools (Stryker, PITest, mutmut), interpreting mutation scores, or improving test suites - provides mutation operators, score interpretation, and integration patterns

Mutation Testing

Overview

Core principle: Mutation testing validates that your tests actually test something by introducing bugs and checking if tests catch them.

Rule: 100% code coverage doesn't mean good tests. Mutation score measures if tests detect bugs.

Code Coverage vs Mutation Score

Metric What It Measures Example
Code Coverage Lines executed by tests calculate_tax(100) executes code = 100% coverage
Mutation Score Bugs detected by tests Change * to / → test still passes = poor tests

Problem with coverage:

def calculate_tax(amount):
    return amount * 0.08

def test_calculate_tax():
    calculate_tax(100)  # 100% coverage, but asserts nothing!

Mutation testing catches this:

  1. Mutates * 0.08 to / 0.08
  2. Runs test
  3. Test still passes → Survived mutation (bad test!)

How Mutation Testing Works

Process:

  1. Create mutant: Change code slightly (e.g., +-, <<=)
  2. Run tests: Do tests fail?
  3. Classify:
    • Killed: Test failed → Good test!
    • Survived: Test passed → Test doesn't verify this logic
    • Timeout: Test hung → Usually killed
    • No coverage: Not executed → Add test

Mutation Score:

Mutation Score = (Killed Mutants / Total Mutants) × 100

Thresholds:

  • > 80%: Excellent test quality
  • 60-80%: Acceptable
  • < 60%: Tests are weak

Tool Selection

Language Tool Why
JavaScript/TypeScript Stryker Best JS support, framework-agnostic
Java PITest Industry standard, Maven/Gradle integration
Python mutmut Simple, fast, pytest integration
C# Stryker.NET .NET ecosystem integration

Example: Python with mutmut

Installation

pip install mutmut

Basic Usage

# Run mutation testing
mutmut run

# View results
mutmut results

# Show survived mutants (bugs your tests missed)
mutmut show

Configuration

# setup.cfg
[mutmut]
paths_to_mutate=src/
backup=False
runner=python -m pytest -x
tests_dir=tests/

Example

# src/calculator.py
def calculate_discount(price, percent):
    if percent > 100:
        raise ValueError("Percent cannot exceed 100")
    return price * (1 - percent / 100)

# tests/test_calculator.py
def test_calculate_discount():
    result = calculate_discount(100, 20)
    assert result == 80

Run mutmut:

mutmut run

Possible mutations:

  1. percent > 100percent >= 100 (boundary)
  2. 1 - percent1 + percent (operator)
  3. percent / 100percent * 100 (operator)
  4. price * (...)price / (...) (operator)

Results:

  • Mutation 1 survived (test doesn't check boundary)
  • Mutation 2, 3, 4 killed (test catches these)

Improvement:

def test_calculate_discount_boundary():
    # Catch mutation 1
    with pytest.raises(ValueError):
        calculate_discount(100, 101)

Common Mutation Operators

Operator Original Mutated What It Tests
Arithmetic a + b a - b Calculation logic
Relational a < b a <= b Boundary conditions
Logical a and b a or b Boolean logic
Unary +x -x Sign handling
Constant return 0 return 1 Magic numbers
Return return x return None Return value validation
Statement deletion x = 5 (deleted) Side effects

Interpreting Mutation Score

High Score (> 80%)

Good tests that catch most bugs.

def add(a, b):
    return a + b

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
    assert add(0, 0) == 0

# Mutations killed:
# - a - b (returns -1, test expects 5)
# - a * b (returns 6, test expects 5)

Low Score (< 60%)

Weak tests that don't verify logic.

def validate_email(email):
    return "@" in email and "." in email

def test_validate_email():
    validate_email("user@example.com")  # No assertion!

# Mutations survived:
# - "@" in email → "@" not in email
# - "and" → "or"
# - (All mutations survive because test asserts nothing)

Survived Mutants to Investigate

Priority order:

  1. Business logic mutations (calculations, validations)
  2. Boundary conditions (<<=, >>=)
  3. Error handling (exception raising)

Low priority: 4. Logging statements 5. Constants that don't affect behavior


Integration with CI/CD

GitHub Actions (Python)

# .github/workflows/mutation-testing.yml
name: Mutation Testing

on:
  schedule:
    - cron: '0 2 * * 0'  # Weekly on Sunday 2 AM
  workflow_dispatch:  # Manual trigger

jobs:
  mutmut:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install mutmut pytest

      - name: Run mutation testing
        run: mutmut run

      - name: Generate report
        run: |
          mutmut results
          mutmut html  # Generate HTML report

      - name: Upload report
        uses: actions/upload-artifact@v3
        with:
          name: mutation-report
          path: html/

Why weekly, not every PR:

  • Mutation testing is slow (10-100x slower than regular tests)
  • Runs every possible mutation
  • Not needed for every change

Anti-Patterns Catalog

❌ Chasing 100% Mutation Score

Symptom: Writing tests just to kill surviving mutants

Why bad:

  • Some mutations are equivalent (don't change behavior)
  • Diminishing returns after 85%
  • Time better spent on integration tests

Fix: Target 80-85%, focus on business logic


❌ Ignoring Equivalent Mutants

Symptom: "95% mutation score, still have survived mutants"

Equivalent mutants: Changes that don't affect behavior

def is_positive(x):
    return x > 0

# Mutation: x > 0 → x >= 0
# If input is never exactly 0, this mutation is equivalent

Fix: Mark as equivalent in tool config

# mutmut - mark mutant as equivalent
mutmut results
# Choose mutant ID
mutmut apply 42 --mark-as-equivalent

❌ Running Mutation Tests on Every Commit

Symptom: CI takes 2 hours

Why bad: Mutation testing is 10-100x slower than regular tests

Fix:

  • Run weekly or nightly
  • Run on core modules only (not entire codebase)
  • Use as quality metric, not blocker

Incremental Mutation Testing

Test only changed code:

# mutmut - test only modified files
git diff --name-only main | grep '\.py$' | mutmut run --paths-to-mutate -

Benefits:

  • Faster feedback (minutes instead of hours)
  • Can run on PRs
  • Focuses on new code

Bottom Line

Mutation testing measures if your tests actually detect bugs. High code coverage doesn't mean good tests.

Usage:

  • Run weekly/nightly, not on every commit (too slow)
  • Target 80-85% mutation score for business logic
  • Use mutmut (Python), Stryker (JS), PITest (Java)
  • Focus on killed vs survived mutants
  • Ignore equivalent mutants

If your tests have 95% coverage but 40% mutation score, your tests aren't testing anything meaningful. Fix the tests, not the coverage metric.