name	mutation-testing
description	Validates test suite quality through mutation testing. Generates intelligent code mutations, runs tests to verify they catch the changes, and identifies gaps in test coverage. Use when evaluating test effectiveness, validating newly written tests, or improving test quality for mission-critical code.

Mutation Testing

Mutation testing evaluates test quality by introducing small, deliberate bugs (mutations) into code and checking if tests catch them. This provides a behavioral measure of test effectiveness beyond simple coverage metrics.

Core Concept

Mutation testing workflow:

Generate mutations (small code changes)
Run test suite against each mutation
Classify results:
- Killed: Test fails (good - test caught the bug)
- Survived: Test passes (bad - test missed the bug)
- Timeout: Test hangs or exceeds time limit
Calculate mutation score: killed / (total - timeouts)
For survived mutants, generate targeted tests

Why mutation testing matters: Tests can achieve 100% line coverage while missing critical bugs. Mutation testing reveals if tests actually verify behavior or just execute code.

When to Use Mutation Testing

Use mutation testing proactively for:

After test generation: Validate that newly written tests are effective
Mission-critical code: Ensure financial, consensus, or security-critical code has thorough tests
PR reviews: Quality gate to prevent merging code with weak tests
Refactoring: Verify tests catch regressions before changing code
Complex logic: Validate tests for boundary conditions, error handling, state machines

Target mutation scores:

Mission-critical code: 90%+
Core business logic: 80-90%
General code: 70-80%
Low-risk code: 60-70%

AST-Based Mutation Generation

This skill uses Go's go/ast and go/parser packages to generate intelligent mutations by analyzing code structure.

Standard Go mutations:

Arithmetic operators: +, -, *, /, %
Relational operators: <, <=, >, >=, ==, !=
Logical operators: &&, ||, negation
Conditional boundaries: off-by-one errors
Statement removal: delete return, assignment, defer
Constant changes: 0, 1, true, false, nil

See mutation_operators.md for complete catalog.

Using the Scripts

All scripts are executable Go programs invoked via shell wrappers.

1. Generate Mutations

~/.claude/skills/mutation-testing/scripts/generate-mutations.sh --file wallet.go --output mutations.json

Analyzes Go source file and generates mutation plan with AST node locations.

2. Run Mutation Tests

~/.claude/skills/mutation-testing/scripts/run-mutation-test.sh --mutation-file mutations.json --mutation-id M0 --package ./internal/wallet --output results/M0.json

Applies specific mutation, runs tests, reports if mutant was killed or survived.

3. Parse Results

~/.claude/skills/mutation-testing/scripts/parse-results.sh --results 'results/*.json' --output report.json

Aggregates mutation results and calculates mutation score.

Integration with Agents

mutation-tester Agent

The mutation-tester agent orchestrates the mutation testing workflow:

Analyzes target code (from git diff or specified files)
Generates intelligent mutations using AST analysis
Runs tests for each mutation in parallel when possible
Identifies surviving mutants and analyzes why tests didn't catch them
Generates targeted tests to kill survivors
Re-runs mutations to verify improvements
Produces detailed mutation report in .reviews/mutations/

test-engineer Integration

After test-engineer generates tests, use mutation testing to validate effectiveness:

User: "Generate tests for CalculateFee function"
[test-engineer creates comprehensive tests]
User: "Run mutation testing to validate"
[mutation-tester verifies test quality]

code-reviewer Integration

Include mutation scores in PR reviews:

User: "/code-review owner/repo#123"
[code-reviewer analyzes changes, invokes mutation-tester]
Review includes: "Mutation score: 82% (18/22 killed), recommend 3 additional tests"

Interpreting Results

High mutation score (>85%)

Tests are thorough and catch most bugs. Focus on surviving mutants if any are high-impact.

Medium mutation score (70-85%)

Tests cover major paths but miss edge cases. Review survivors and add boundary tests.

Low mutation score (<70%)

Significant test gaps. Tests may only verify happy paths. Add error handling, boundary, and negative tests.

Surviving mutants

For each survivor, consider:

Equivalent mutant: Mutation doesn't change behavior (can ignore)
Missing test: Need test for that code path
Weak assertion: Test runs code but doesn't verify output
Boundary condition: Need edge case test

Best Practices

Focus on high-impact code: Run mutation testing on critical paths, not trivial getters/setters.

Interpret in context: Mutation score is a signal, not a goal. A 75% score with good tests covering critical paths may be better than 95% with superficial tests.

Handle equivalent mutants: Some mutations don't change behavior (e.g., i++ vs ++i in some contexts). Flag and ignore these.

Mutation testing is not fuzzing: Mutations test if existing tests catch changes. Fuzzing tests if code handles unexpected inputs. Both are valuable but different.

Iterate: Use mutation testing to guide test improvement, not as one-time audit.

Performance: For large codebases, run mutation testing on changed files only (from git diff). Full mutation testing can be done nightly.

Example Workflow

// Original code in wallet.go
func CalculateFee(amount int64) int64 {
    if amount > 1000 {
        return amount / 100
    }
    return 10
}

// Generated mutations:
// M1: amount >= 1000 (boundary condition)
// M2: amount < 1000 (relational flip)
// M3: amount == 1000 (boundary)
// M4: return amount / 10 (arithmetic change)
// M5: return 0 (constant change)

// Existing test
func TestCalculateFee(t *testing.T) {
    fee := CalculateFee(2000)
    assert.Equal(t, 20, fee)
}

// Mutation results:
// M1: SURVIVED - test doesn't check boundary at 1000
// M2: KILLED - test with 2000 fails
// M3: SURVIVED - test doesn't check exact boundary
// M4: KILLED - wrong calculation detected
// M5: KILLED - wrong result detected

// Score: 60% (3/5 killed)
// Generate tests for M1 and M3:

func TestCalculateFee_Boundary(t *testing.T) {
    // Test exact boundary
    assert.Equal(t, 10, CalculateFee(1000))
    // Test just above boundary
    assert.Equal(t, 10, CalculateFee(1001))
}

// Re-run: 100% (5/5 killed)

Troubleshooting

"No mutations generated": Check that file contains mutatable code (not just type definitions or constants).

"All mutants timeout": Tests may be running too slowly or hanging. Check test implementation.

"Mutation score very low": Tests may only check happy paths. Add error cases, boundary tests, and assertions on actual behavior.

"Cannot compile mutated code": Mutation may have violated type constraints. This is a bug in mutation generation - report it.

mutation-testing

Install Skill

SKILL.md