name	mutation-testing
description	Use when validating test effectiveness, measuring test quality beyond coverage, choosing mutation testing tools (Stryker, PITest, mutmut), interpreting mutation scores, or improving test suites - provides mutation operators, score interpretation, and integration patterns

Mutation Testing

Overview

Core principle: Mutation testing validates that your tests actually test something by introducing bugs and checking if tests catch them.

Rule: 100% code coverage doesn't mean good tests. Mutation score measures if tests detect bugs.

Code Coverage vs Mutation Score

Metric	What It Measures	Example
Code Coverage	Lines executed by tests	`calculate_tax(100)` executes code = 100% coverage
Mutation Score	Bugs detected by tests	Change `*` to `/` → test still passes = poor tests

Problem with coverage:

def calculate_tax(amount):
    return amount * 0.08

def test_calculate_tax():
    calculate_tax(100)  # 100% coverage, but asserts nothing!

Mutation testing catches this:

Mutates * 0.08 to / 0.08
Runs test
Test still passes → Survived mutation (bad test!)

How Mutation Testing Works

Process:

Create mutant: Change code slightly (e.g., + → -, < → <=)
Run tests: Do tests fail?
Classify:
- Killed: Test failed → Good test!
- Survived: Test passed → Test doesn't verify this logic
- Timeout: Test hung → Usually killed
- No coverage: Not executed → Add test

Mutation Score:

Mutation Score = (Killed Mutants / Total Mutants) × 100

Thresholds:

> 80%: Excellent test quality
60-80%: Acceptable
< 60%: Tests are weak

Tool Selection

Language	Tool	Why
JavaScript/TypeScript	Stryker	Best JS support, framework-agnostic
Java	PITest	Industry standard, Maven/Gradle integration
Python	mutmut	Simple, fast, pytest integration
C#	Stryker.NET	.NET ecosystem integration

Example: Python with mutmut

Installation

pip install mutmut

Basic Usage

# Run mutation testing
mutmut run

# View results
mutmut results

# Show survived mutants (bugs your tests missed)
mutmut show

Configuration

# setup.cfg
[mutmut]
paths_to_mutate=src/
backup=False
runner=python -m pytest -x
tests_dir=tests/

Example

# src/calculator.py
def calculate_discount(price, percent):
    if percent > 100:
        raise ValueError("Percent cannot exceed 100")
    return price * (1 - percent / 100)

# tests/test_calculator.py
def test_calculate_discount():
    result = calculate_discount(100, 20)
    assert result == 80

Run mutmut:

mutmut run

Possible mutations:

percent > 100 → percent >= 100 (boundary)
1 - percent → 1 + percent (operator)
percent / 100 → percent * 100 (operator)
price * (...) → price / (...) (operator)

Results:

Mutation 1 survived (test doesn't check boundary)
Mutation 2, 3, 4 killed (test catches these)

Improvement:

def test_calculate_discount_boundary():
    # Catch mutation 1
    with pytest.raises(ValueError):
        calculate_discount(100, 101)

Common Mutation Operators

Operator	Original	Mutated	What It Tests
Arithmetic	`a + b`	`a - b`	Calculation logic
Relational	`a < b`	`a <= b`	Boundary conditions
Logical	`a and b`	`a or b`	Boolean logic
Unary	`+x`	`-x`	Sign handling
Constant	`return 0`	`return 1`	Magic numbers
Return	`return x`	`return None`	Return value validation
Statement deletion	`x = 5`	(deleted)	Side effects

Interpreting Mutation Score

High Score (> 80%)

Good tests that catch most bugs.

def add(a, b):
    return a + b

def test_add():
    assert add(2, 3) == 5
    assert add(-1, 1) == 0
    assert add(0, 0) == 0

# Mutations killed:
# - a - b (returns -1, test expects 5)
# - a * b (returns 6, test expects 5)

Low Score (< 60%)

Weak tests that don't verify logic.

def validate_email(email):
    return "@" in email and "." in email

def test_validate_email():
    validate_email("user@example.com")  # No assertion!

# Mutations survived:
# - "@" in email → "@" not in email
# - "and" → "or"
# - (All mutations survive because test asserts nothing)

Survived Mutants to Investigate

Priority order:

Business logic mutations (calculations, validations)
Boundary conditions (< → <=, > → >=)
Error handling (exception raising)

Low priority: 4. Logging statements 5. Constants that don't affect behavior

Integration with CI/CD

GitHub Actions (Python)

# .github/workflows/mutation-testing.yml
name: Mutation Testing

on:
  schedule:
    - cron: '0 2 * * 0'  # Weekly on Sunday 2 AM
  workflow_dispatch:  # Manual trigger

jobs:
  mutmut:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install mutmut pytest

      - name: Run mutation testing
        run: mutmut run

      - name: Generate report
        run: |
          mutmut results
          mutmut html  # Generate HTML report

      - name: Upload report
        uses: actions/upload-artifact@v3
        with:
          name: mutation-report
          path: html/

Why weekly, not every PR:

Mutation testing is slow (10-100x slower than regular tests)
Runs every possible mutation
Not needed for every change

Anti-Patterns Catalog

❌ Chasing 100% Mutation Score

Symptom: Writing tests just to kill surviving mutants

Why bad:

Some mutations are equivalent (don't change behavior)
Diminishing returns after 85%
Time better spent on integration tests

Fix: Target 80-85%, focus on business logic

❌ Ignoring Equivalent Mutants

Symptom: "95% mutation score, still have survived mutants"

Equivalent mutants: Changes that don't affect behavior

def is_positive(x):
    return x > 0

# Mutation: x > 0 → x >= 0
# If input is never exactly 0, this mutation is equivalent

Fix: Mark as equivalent in tool config

# mutmut - mark mutant as equivalent
mutmut results
# Choose mutant ID
mutmut apply 42 --mark-as-equivalent

❌ Running Mutation Tests on Every Commit

Symptom: CI takes 2 hours

Why bad: Mutation testing is 10-100x slower than regular tests

Fix:

Run weekly or nightly
Run on core modules only (not entire codebase)
Use as quality metric, not blocker

Incremental Mutation Testing

Test only changed code:

# mutmut - test only modified files
git diff --name-only main | grep '\.py$' | mutmut run --paths-to-mutate -

Benefits:

Faster feedback (minutes instead of hours)
Can run on PRs
Focuses on new code

Bottom Line

Mutation testing measures if your tests actually detect bugs. High code coverage doesn't mean good tests.

Usage:

Run weekly/nightly, not on every commit (too slow)
Target 80-85% mutation score for business logic
Use mutmut (Python), Stryker (JS), PITest (Java)
Focus on killed vs survived mutants
Ignore equivalent mutants

If your tests have 95% coverage but 40% mutation score, your tests aren't testing anything meaningful. Fix the tests, not the coverage metric.

mutation-testing

Install Skill

SKILL.md