name	test-diff-analyzer
description	Analyze test differences between runs to identify flaky tests and consistency issues. Use to find tests that fail intermittently.
category	testing
mcp_fallback	none

Analyze Test Differences Between Runs

Compare test results across multiple runs to identify flaky tests.

When to Use

Test passes locally but fails in CI
Test sometimes passes, sometimes fails (flaky test)
Need to understand test consistency issues
Comparing test results before/after code changes
Debugging intermittent test failures

Quick Reference

# Run tests and capture output
pixi run mojo test -I . tests/ > /tmp/test_run_1.log

# Compare two test runs
diff -u /tmp/test_run_1.log /tmp/test_run_2.log

# Extract failures from log
grep "FAILED" /tmp/test_run_*.log | sort | uniq -c

# Show tests that sometimes pass, sometimes fail
grep "FAILED\|PASSED" /tmp/test_run_*.log | cut -d: -f2 | sort | uniq -d

Analysis Workflow

Collect baseline: Run tests locally N times
Collect CI data: Get CI test results from recent runs
Compare outputs: Diff between test runs
Identify flaky tests: Tests with inconsistent results
Find patterns: When does test fail vs pass
Root cause: Timing, randomness, resource issues
Remediation: Fix or isolate flaky test

Flaky Test Indicators

Timing Issues:

Test passes when run in isolation
Test fails when run with other tests
Timeout values too aggressive
Race conditions in setup/teardown

Randomness Issues:

Random seed not fixed
Hash ordering varies
Dictionary/set iteration order
Floating point precision

Resource Issues:

Test passes locally but fails in CI
Fails under resource constraints
Out of memory errors intermittently
Disk space dependent

Output Format

Report analysis with:

Flaky Tests - Tests with inconsistent results
Consistency Score - Pass rate across runs (e.g., 80% pass rate)
Failure Patterns - When/how tests fail
Impact - How many test runs affected
Root Cause Hypothesis - What likely causes instability
Recommendations - How to fix or isolate flaky test

Error Handling

Problem	Solution
Different environment	Run in controlled environment (docker)
Insufficient data	Run more iterations to get pattern
No failure info	Enable debug output, increase verbosity
External dependencies	Mock or isolate external services
Timing-dependent	Add explicit waits or retry logic

References

See mojo-test-runner for test execution options
See extract-test-failures for failure analysis
See CLAUDE.md for test standards and TDD workflow

test-diff-analyzer

Install Skill

SKILL.md

Analyze Test Differences Between Runs

When to Use

Quick Reference

Analysis Workflow

Flaky Test Indicators

Output Format

Error Handling

References