name	test-fixer
description	MANDATORY when: tests fail, CI fails, CI/CD fails, ci.yml fails, user provides CI log URL. Investigates failures, fixes iteratively until passing.
allowed-tools	Bash, Read, Edit, Write, Grep, Glob

Test Fixer - Systematic Failure Investigation & Fix Iteration

╔══════════════════════════════════════════════════════════════════════════╗ ║ 🚨 MANDATORY: When tests fail, you MUST use this skill ║ ║ ║ ║ DO NOT: ║ ║ - ❌ Guess at fixes without investigation ║ ║ - ❌ Report findings without attempting fixes ║ ║ - ❌ Stop after one failed fix attempt ║ ║ - ❌ Leave code in broken state ║ ║ ║ ║ GOAL: Return code to passing state with minimal changes ║ ╚══════════════════════════════════════════════════════════════════════════╝

🎯 Purpose

This skill provides a systematic workflow for:

Investigating test failures to identify root cause
Writing fixes based on investigation findings
Iterating until all tests pass
Restoring code to working state with minimal changes

🚨 MANDATORY TRIGGERS - Use this skill when ANY of these occur:

After running test-runner and ANY tests fail
After code changes and test-runner reports failures
After merge/pull and tests fail
CI reports test failures in any environment
User says: "tests are broken" OR "tests failing" OR "fix the tests"
Pre-commit hook fails due to test failures
You see test error output from any source
User provides CI log URL (Azure Blob, GitHub Actions, etc.)

⚡ CI Log URLs: Download IMMEDIATELY

CI log URLs have short-lived tokens (~10 min). When user provides one:

# 1. IMMEDIATELY WebFetch the URL (before it expires)
# 2. Grep for failures: grep -iE "failed|error|FAILED"
# 3. Get diff vs main to develop thesis
git diff main...HEAD
# 4. Apply test-fixer workflow below

URL pattern: productionresultssa16.blob.core.windows.net/actions-results/*

Integration with other skills:

test-runner detects failures → immediately invoke test-fixer
task-complete-enforcer finds test failures → immediately invoke test-fixer
Pre-commit hooks show test failures → immediately invoke test-fixer

🚨 CRITICAL: Do NOT investigate test failures manually. ALWAYS use this skill.

🛠️ Test Infrastructure Quick Reference

Command/File	Purpose
`just lint-and-fix`	Auto-fix formatting + run linting + mypy type checks
`just test-all-mocked`	Fast tests with mocked external services (used in CI)
`just test-all`	All tests including live/e2e with real services
`just test-smoke`	Smoke tests requiring Docker (production-like)
`.pre-commit-config.yaml`	Git hooks: lint on pre-commit, full tests on pre-push
`.github/workflows/ci.yml`	GitHub Actions: runs `just lint` + `just test-all-mocked` on every push
`.github/workflows/deploy.yml`	Deployment workflow (staging/production)

CI/CD flow: Push → ci.yml runs lint+mocked tests → deploy.yml deploys if tests pass

🚨 When CI/CD fails: Download the logs immediately (URLs expire in ~10 min) and consult .github/workflows/ci.yml to understand which step failed and reproduce locally.

🚨 CRITICAL FOR TEST WRITING

BEFORE writing tests → Use test-writer skill (MANDATORY - analyzes code type, dependencies, contract)
AFTER writing tests → Invoke pytest-test-reviewer agent (validates patterns)
YOU CANNOT WRITE TESTS WITHOUT test-writer SKILL - No exceptions, no shortcuts, every test, every time

🔄 How This Skill Interacts With Other Skills

test-fixer orchestrates the fix workflow and uses other skills:

test-runner skill - To verify fixes work
- After each fix attempt, use test-runner to run: just lint-and-fix && just test-all-mocked
- Final verification uses test-runner's full parallel test suite
test-writer skill - When modifying test files
- MANDATORY before changing test assertions, expectations, or fixtures
- test-writer ensures you don't encode broken behavior into tests
- test-writer helps determine if tests need updating or code needs fixing
sql-reader skill - To understand production data model
- Use when debugging integration/e2e test failures
- Query production to see actual data model state
- Helps identify mismatches between test fixtures and reality
Git repository management - test-fixer owns git operations
- Creates safety backups (git diff)
- Stashes/unstashes changes during investigation
- Manages checkout operations (HEAD, main, branches)
- NEVER deletes user's code (always reversible operations)

⚠️ FUNDAMENTAL OPERATING PRINCIPLES

YOUR MINDSET AS test-fixer:

You are a persistent, autonomous developer who KNOWS tests can pass because:

Tests ALWAYS pass at main (CI enforces this) - this is a FACT
Therefore, a passing state EXISTS and is REACHABLE
Your job: systematically find that state and restore it
You WILL get tests passing - it's not a question of IF, only WHEN

CRITICAL ASSUMPTIONS:

Tests passed at main → they can pass again
Tests passed at branch HEAD → you can restore that state
If tests fail now → something specific changed → that change can be identified and fixed
NEVER delete user's code (always stash or use temp branch)

GOAL: Return code to passing state (you KNOW it exists) with:

✅ All tests pass
✅ Original functionality preserved
✅ Minimal changes from original code
✅ No debugging artifacts left behind

🚨 BANNED TERMINOLOGY - NEVER Say "Tests Pass" Without Proof

YOU ARE FORBIDDEN from saying these phrases unless ALL tests actually pass:

❌ BANNED: "tests pass" ❌ BANNED: "tests are passing" ❌ BANNED: "all tests pass" ❌ BANNED: "tests now pass"

To say tests pass, you MUST:

Run .claude/skills/test-runner/scripts/run_tests_parallel.sh (full suite)
Check ALL log files (mocked + e2e-live + smoke)
Verify ZERO failures in all logs
See actual passing output

You MAY say (with proof):

✅ "Quick tests pass" (after just test-all-mocked shows 0 failures)
✅ "Mocked tests pass" (after just test-all-mocked shows 0 failures)
✅ "X out of Y tests now pass" (showing progress during iteration)

Key difference from test-runner:

test-runner: Runs tests and reports status (may report failures)
test-fixer: ITERATES until ALL tests pass (never stops at failures)

Your job is NOT complete until:

✅ Full parallel test suite runs successfully
✅ All logs show 0 failures
✅ You can truthfully say "all tests pass" with proof

🔄 ITERATIVE FIX WORKFLOW

You can announce you're starting ("Running systematic test fix workflow..."), but DO NOT report partial results mid-investigation. Complete the full cycle, THEN report complete findings.

Phase 0: Create Safety Backup (CRITICAL - Do This First)

BEFORE any investigation, create recoverable backup:

# Create tmp directory if it doesn't exist
mkdir -p tmp

# Create timestamped diff backup of ALL changes (staged AND unstaged) vs HEAD
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
git diff HEAD > tmp/test_fixer_start_${TIMESTAMP}.diff
git diff --staged >> tmp/test_fixer_start_${TIMESTAMP}.diff

# Announce backup location
echo "Safety backup created: tmp/test_fixer_start_${TIMESTAMP}.diff"

# CRITICAL: Remember this filename for recovery if needed
BACKUP_FILE="tmp/test_fixer_start_${TIMESTAMP}.diff"

This backup captures EVERYTHING:

All unstaged changes (git diff HEAD)
All staged changes (git diff --staged)
Nothing is lost, even if git operations fail

If anything goes wrong later, you can recover with:

# Find your backup
ls -lt tmp/test_fixer_start_*.diff | head -1

# Apply the backup
git apply tmp/test_fixer_start_TIMESTAMP.diff

Phase 1: Systematic Investigation

Step 1: Run tests and check for obvious fix

# FIRST: Run the failing tests to see the error
just test-all-mocked  # Or whichever suite failed

# SECOND: Review the diff to understand what changed
git diff HEAD  # Shows all changes (staged + unstaged)

# THIRD: Analyze
# - What tests are failing?
# - What did I change in the diff?
# - Is it OBVIOUS what in the diff caused the failure?

# If OBVIOUS what broke (e.g., renamed function but forgot to update test):
#   → Fix it immediately and verify
#   → Skip to Phase 3 (cleanup)
#
# If NOT OBVIOUS:
#   → Continue to Step 2 (establish baseline)

Step 2: Establish baseline (where do tests pass?)

Only do this step if the fix wasn't obvious from Step 1.

# Save current state (in addition to diff backup)
git stash save "WIP: test-fixer investigating failures"

# Test at current point (without your changes)
just test-all-mocked  # Or whichever suite failed

# Record: PASS or FAIL?

If tests PASS without your changes:

Root cause: Your changes broke the tests (most common case)
Restore changes: git stash pop
Proceed to Phase 2: Iterative Fixing

If tests STILL FAIL without your changes:

Continue investigation to find where tests pass

# Check HEAD of current branch
git status  # Note branch name
# If at HEAD: git reset --hard origin/your-branch-name
# If behind HEAD: git reset --hard HEAD

just test-all-mocked

# Record: PASS or FAIL?

If tests PASS at branch HEAD:

Root cause: Regression between HEAD and your current commit
Proceed to Phase 2: Iterative Fixing

If tests STILL FAIL at branch HEAD:

Check main branch

git checkout main
git pull origin main

just test-all-mocked

# Record: PASS or FAIL?

If tests PASS at main:

Root cause: Regression between main and your branch
Proceed to Phase 2: Iterative Fixing

If tests FAIL at main:

Pre-existing issue (RARE - investigate why CI didn't catch)
May need to check earlier commits or different branch

Phase 2: Iterative Fixing

Once you've identified where tests pass, restore your code and start fixing:

# Return to original branch and restore changes
git checkout -  # or git checkout your-branch-name
git stash pop   # Restore your changes

Now begin fix iteration loop:

Fix Iteration Template

# ITERATION N
# 1. Analyze failure
just test-all-mocked  # Get fresh failure output
# Read error messages, identify specific test failures
# Identify root cause from stack traces

# 2. Write targeted fix
# - Edit specific file(s) causing failure
# - Make MINIMAL changes to fix identified issue
# - Document what you're fixing
# - 🚨 CRITICAL: If updating/fixing tests, ALWAYS use test-writer skill first
# - 💡 OPTIONAL: Use sql-reader skill to understand production data model if debugging integration/e2e tests

# 3. Verify fix
just lint-and-fix     # Auto-fix + type checking
just test-all-mocked  # Run tests

# 4. Evaluate result
# - If ALL tests PASS → SUCCESS, proceed to Phase 3
# - If SOME tests PASS (progress) → Continue iteration
# - If NO progress or NEW failures → Revert this fix attempt

Revert failed fix attempt:

# If fix didn't help or made things worse:
git diff  # Review what you changed
git checkout -- path/to/file.py  # Revert specific file
# OR
git stash  # Stash failed attempt
git stash drop  # Discard it

# Then try different approach in next iteration

Iteration approach - Be DOGGED:

Try UP TO 10 different fix approaches
Each approach should be DIFFERENT from previous attempts:
1. Fix specific failing test (🚨 use test-writer skill)
2. Fix underlying code causing failure
3. Update test expectations (🚨 use test-writer skill - if contract legitimately changed)
4. Fix imports/dependencies
5. Revert specific file causing issues
6. Check for missing files or configuration
7. Fix type errors
8. Update fixtures or test setup (🚨 use test-writer skill)
9. Check for circular dependencies
10. Simplify complex changes into smaller fixes

🚨 CRITICAL RULE: When modifying test files, ALWAYS use test-writer skill

Before changing any test assertions, consult test-writer
Before updating test fixtures, consult test-writer
Before changing test expectations, consult test-writer
test-writer ensures you don't create brittle or self-evident tests

Be dogged and persistent (you KNOW tests can pass):

Remember: Tests passed at main → a solution EXISTS
After 5 attempts: Take stock, try completely different angle
After 10 attempts: Report to user with:
- What you've tried (all 10 approaches)
- What you've learned (patterns in failures)
- Recommended next steps (specific, actionable)
NEVER give up thinking "it can't be fixed" - it CAN because it passed before

Phase 3: Final Verification & Cleanup

Once quick tests pass, verify ALL tests:

# 1. Quick verification first
just lint-and-fix     # Auto-fix + type checking
just test-all-mocked  # Quick tests must pass

# 2. Clean up any debugging artifacts
# - Remove print statements
# - Remove commented code
# - Remove temporary variables
# - Check git diff for unintended changes

# 3. Run FULL test suite (MANDATORY before saying "tests pass")
.claude/skills/test-runner/scripts/run_tests_parallel.sh

# 4. Check ALL logs for failures
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-mocked_*.log
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-e2e-live_*.log
grep -E "failed|ERROR|FAILED" api/tmp/test-logs/test-smoke_*.log

# 5. ONLY NOW can you say "all tests pass"
# - If ANY failures found → return to Phase 2, keep iterating
# - If ZERO failures → success, report to user

🚨 DO NOT report success until Step 5 shows ZERO failures across ALL suites.

📋 COMPLETE WORKFLOW EXAMPLE

Test Fixer Workflow - Iteration Log
===================================

Phase 1: Investigation
----------------------
✅ Tests PASS at: HEAD without my changes (stashed)
❌ Tests FAIL at: Current commit (my changes)

Identified 3 failing tests:
- test_get_message_requests::test_dual_audience
- test_get_message_requests::test_deprecated_format
- test_conversation_processor::test_voice_prompts

Root cause: Missing MESSAGE_ORDER handling in get_message_requests()

Phase 2: Iterative Fixing
--------------------------

Iteration 1:
- Fix: Add MESSAGE_ORDER dict detection in get_message_requests()
- Result: 2/3 tests pass ✅ (progress!)
- Remaining: test_voice_prompts still failing

Iteration 2:
- Fix: Update voice prompt handling to use new format
- Result: ALL TESTS PASS ✅

Phase 3: Final Verification
----------------------------
✅ just lint-and-fix - passed
✅ just test-all-mocked - all 464 tests passed

Cleanup:
- Removed debug print statements (2 files)
- Removed commented-out code (util.py:45-47)

SUCCESS: All tests passing, code restored to working state with minimal changes.

Changes made:
- api/src/message_processing/util.py: Added MESSAGE_ORDER handling (15 lines)
- api/src/message_processing/conversation_processors.py: Updated voice prompt format (2 lines)

Ready for commit.

🆘 EMERGENCY RECOVERY

If anything goes wrong during test-fixer (checkout fails, stash lost, etc.):

# 1. DON'T PANIC - your code is backed up!

# 2. Find your most recent backup
ls -lt tmp/test_fixer_start_*.diff | head -5

# 3. Check what's in the backup
git apply --check tmp/test_fixer_start_TIMESTAMP.diff

# 4. Apply the backup
git apply tmp/test_fixer_start_TIMESTAMP.diff

# 5. Verify your changes are restored
git status
git diff

# 6. If stash exists, recover it too
git stash list
git stash pop  # If you see your stash

Recovery checklist:

✅ Find backup: ls -lt tmp/test_fixer_start_*.diff | head -1
✅ Verify backup: git apply --check tmp/test_fixer_start_TIMESTAMP.diff
✅ Apply backup: git apply tmp/test_fixer_start_TIMESTAMP.diff
✅ Check stash: git stash list && git stash pop
✅ Confirm restoration: git status && git diff

The git diff backup ensures you NEVER lose work, even if:

Stash gets lost
Checkout conflicts occur
Branch switching fails
Anything unexpected happens

🚨 CRITICAL RULES

1. NEVER Delete User's Code

ALWAYS use reversible operations:

✅ git stash (can be popped)
✅ git checkout -b temp-fix (can switch back)
✅ Keep backup: cp file.py file.py.backup
❌ NEVER git reset --hard without stashing first
❌ NEVER git clean -fd on uncommitted changes
❌ NEVER delete files without backing up

2. Complete Investigation Before Reporting

BANNED mid-investigation reports:

❌ "I've stashed changes, running tests..."
❌ "Tests failing at HEAD, checking main..."
❌ "Found the issue, about to fix..."

CORRECT workflow:

✅ Announce: "Running systematic test fix workflow..."
✅ Silently complete: investigate → identify → fix → verify → cleanup
✅ THEN report: Complete findings with tests passing

3. Iterate Until Fixed (Be DOGGED - Don't Give Up)

Fix iteration principles:

Try up to 10 different approaches (be creative!)
Each iteration: analyze → fix → verify → evaluate
If fix doesn't help: revert and try DIFFERENT approach
Track what you've tried to avoid repeating failures
After 5 attempts: step back, try completely different angle
After 10 attempts: report comprehensive analysis to user
GOAL: Return working code - don't stop until tests pass or all approaches exhausted

4. Minimal Changes Philosophy

When fixing:

Change ONLY what's necessary to fix the test
Don't refactor unrelated code
Don't "improve" working code
Focus on restoring passing state with minimal diff

5. Always Restore to Working State

Before reporting to user:

✅ All tests must pass (verify with just test-all-mocked)
✅ No stashed changes left behind
✅ Code on correct branch
✅ No debugging artifacts (prints, comments, TODOs)
✅ Clean git status

🎯 Success Criteria

Your work is complete when:

✅ All tests pass (just test-all-mocked shows 0 failures)
✅ Linting passes (just lint-and-fix clean)
✅ Code is clean (no debug artifacts, commented code, or prints)
✅ Minimal changes (git diff shows only necessary fixes)
✅ User's code preserved (original intent and functionality intact)
✅ Complete report (what broke, what was fixed, how to prevent)

🔍 Common Failure Patterns & Fixes

Pattern 1: Import Errors After Refactoring

Error: ModuleNotFoundError: No module named 'old_module'

Investigation: Code was refactored, imports not updated
Fix: Update all import statements to new module path
Verify: Grep for old module name across codebase

Pattern 2: Type Errors After API Changes

Error: Argument 1 has incompatible type "str"; expected "dict"

Investigation: Function signature changed
Fix: Update all call sites to use new signature
Verify: Check mypy errors, fix all occurrences

Pattern 3: Test Assumptions Broken

Error: AssertionError: expected 3, got 2

Investigation: Business logic changed, test expects old behavior
Fix: Determine if test or code is wrong
  - If test is outdated: update test expectations (announce change)
  - If code is wrong: fix the code
Verify: Understand the contract being tested

Pattern 4: Missing Dependencies

Error: AttributeError: 'NoneType' object has no attribute 'x'

Investigation: Function expects parameter that's now optional/missing
Fix: Add null checking or provide default value
Verify: Check all code paths that call this function

📊 Reporting Template

When you complete the fix workflow, report:

Test Fixer - Complete Report
=============================

Investigation Results:
✅ Tests PASS at: [main/branch HEAD/specific commit]
❌ Tests FAIL at: [your changes/branch HEAD/main]

Root Cause Identified:
[Clear explanation of what broke and why]

Fix Iterations: [N attempts]
Iteration 1: [what was tried] → [result]
Iteration 2: [what was tried] → [result]
...

Final Status:
✅ All tests passing (464/464)
✅ Linting clean
✅ Code restored to working state

Changes Made:
- file1.py: [description] (X lines changed)
- file2.py: [description] (Y lines changed)

Prevention:
[How to avoid this issue in the future]

Ready for: [next steps - commit/review/continue work]

⚡ When to Use This Skill

MANDATORY when:

just test-all-mocked shows failures
CI reports test failures
User says "tests are broken" or "tests failing"
After merging/pulling and tests fail
After refactoring when tests break

DO NOT use when:

Tests are passing (use test-runner instead)
Writing new tests (use test-writer instead)
Just want to run tests (use test-runner instead)

🚀 Quick Reference

# Investigation Phase
git stash save "WIP: test-fixer"
just test-all-mocked  # Where do tests pass?

# If tests pass here → your changes broke them
git stash pop
# Start fixing

# Fix Iteration Loop
while tests_failing:
    # 1. Analyze failure
    just test-all-mocked

    # 2. Write targeted fix
    # Edit specific files

    # 3. Verify
    just lint-and-fix && just test-all-mocked

    # 4. Evaluate
    if all_tests_pass:
        break
    elif some_progress:
        continue
    else:
        revert_this_attempt()

# Final Verification
just lint-and-fix
just test-all-mocked
# Report success

🎓 Remember

Investigation first - Don't guess, systematically find where tests pass
Iterate to fix - Don't stop after first attempt
Minimal changes - Fix only what's broken
Always verify - Run full test suite after each fix
Clean up - Remove debug artifacts before reporting
Report complete - User gets findings + working code, not work-in-progress

Goal: User receives working, tested code with clear explanation of what was fixed.

test-fixer

Install Skill

SKILL.md