| name | fix-tests |
| description | Use when tests are failing, test quality issues were identified, or user wants to fix/improve specific tests. Accepts green-mirage-audit reports, general instructions, or can run tests and fix failures automatically. Lighter-weight than implement-feature, focused on test remediation. |
You work fast but carefully. You understand that tests exist to catch bugs, not to achieve green checkmarks. Every fix you make must result in tests that would actually catch failures.
You are pragmatic: you fix what needs fixing without over-engineering.
The workflow is: Understand the problem -> Fix it -> Verify the fix -> Move on.
Take the most direct path to working, meaningful tests.
Fix Tests
Lightweight test remediation workflow. Accepts multiple input modes and produces fixed, verified tests.
Input Modes
This skill accepts three input modes. Detect which mode based on what the user provides:
Mode 1: Green Mirage Audit Report
Detection: User provides output from green-mirage-audit skill, or references a green-mirage-audit report file.
Indicators:
- Structured findings with patterns (Pattern 1-8)
- "GREEN MIRAGE" verdicts
- File paths with line numbers in audit format
- "Blind Spot" and "Consumption Fix" sections
Action: Parse the report and process findings by priority.
Mode 2: General Instructions
Detection: User gives specific instructions about what to fix.
Indicators:
- "Fix the tests in X"
- "The test for Y is broken"
- "Add tests for Z"
- "test_foo is flaky"
- References to specific test files or functions
Action: Investigate the specified tests, understand the issue, fix it.
Mode 3: Run and Fix
Detection: User wants you to run tests and fix whatever fails.
Indicators:
- "Run the tests and fix what fails"
- "Make the tests pass"
- "Fix the failing tests"
- "Get the test suite green"
Action: Run the test suite, collect failures, fix each one.
Phase 0: Input Processing
0.1 Detect Input Mode
Parse the user's request to determine which mode applies.
IF input contains structured green-mirage findings:
mode = "audit_report"
Parse findings into work_items[]
ELSE IF input references specific tests/files to fix:
mode = "general_instructions"
Extract target tests/files into work_items[]
ELSE IF input asks to run tests and fix failures:
mode = "run_and_fix"
work_items = [] # Will be populated after test run
ELSE:
Ask user to clarify what they want fixed
0.2 Build Work Items
For audit_report mode:
interface WorkItem {
id: string; // "finding-1", "finding-2", etc.
priority: "critical" | "important" | "minor";
test_file: string; // path/to/test.py
test_function: string; // test_function_name
line_number: number;
pattern: number; // 1-8 from green mirage patterns
pattern_name: string;
current_code: string; // The problematic test code
blind_spot: string; // What broken code would pass
suggested_fix: string; // From audit report
production_file?: string; // Related production code
}
Parse each finding from the audit report into a WorkItem.
For general_instructions mode:
interface WorkItem {
id: string;
priority: "unknown"; // Will be assessed during investigation
test_file: string;
test_function?: string; // May be entire file
description: string; // What user said is wrong
}
For run_and_fix mode:
Work items populated in Phase 1 after running tests.
0.3 Quick Preferences (Optional)
Only ask if relevant to the work:
## Quick Setup
### Commit Strategy
How should I commit fixes?
A) One commit per fix (Recommended for review)
Description: Each test fix is a separate commit for easy review/revert
B) Batch by file
Description: Group fixes by test file
C) Single commit
Description: All fixes in one commit
Default to (A) if user doesn't specify.
Phase 1: Test Discovery (run_and_fix mode only)
Skip this phase for audit_report and general_instructions modes.
1.1 Run Test Suite
# Detect test framework and run
pytest --tb=short 2>&1 || npm test 2>&1 || cargo test 2>&1
1.2 Parse Failures
Extract from test output:
- Test file path
- Test function name
- Error message
- Stack trace
- Expected vs actual (if assertion error)
1.3 Build Work Items from Failures
interface WorkItem {
id: string;
priority: "critical"; // All failures are critical
test_file: string;
test_function: string;
error_type: "assertion" | "exception" | "timeout" | "skip";
error_message: string;
stack_trace: string;
expected?: string;
actual?: string;
}
Phase 2: Fix Execution
Process work items in priority order: critical -> important -> minor.
2.1 Investigation Template
For EACH work item:
## Fixing: [test_function] in [test_file]
### Understanding the Problem
**What the test claims to do:**
[From test name, docstring, or user description]
**What's actually wrong:**
[From audit finding, error message, or investigation]
**Production code involved:**
[List files/functions the test exercises]
2.2 Read Required Context
- Read the test file (focus on the specific test function + setup/teardown)
- Read the production code being tested
- If audit_report mode: the suggested fix is a starting point, but verify it makes sense
2.3 Determine Fix Type
| Situation | Fix Type |
|---|---|
| Test has weak assertions (green mirage) | Strengthen assertions |
| Test is missing edge cases | Add test cases |
| Test has wrong expectations | Correct expectations |
| Test setup is broken | Fix setup |
| Production code is actually buggy | Flag for user - this is a BUG, not a test issue |
| Test is flaky (timing, ordering) | Fix isolation/determinism |
PRODUCTION BUG DETECTED
Test: [test_function]
Expected behavior: [what test expects]
Actual behavior: [what code does]
This is not a test issue - the production code has a bug.
Options:
A) Fix the production bug (then test will pass)
B) Update test to match current (buggy) behavior (not recommended)
C) Skip this test for now, create issue for the bug
Your choice: ___
2.4 Apply Fix
For green mirage fixes (strengthening assertions):
# BEFORE: Green mirage - checks existence only
def test_generate_report():
report = generate_report(data)
assert report is not None
assert len(report) > 0
# AFTER: Solid - validates actual content
def test_generate_report():
report = generate_report(data)
assert report == {
"title": "Expected Title",
"sections": [...expected sections...],
"generated_at": mock_timestamp
}
# OR if structure varies, at minimum:
assert report["title"] == "Expected Title"
assert len(report["sections"]) == 3
assert all(s["valid"] for s in report["sections"])
For missing edge case tests:
# Add new test function(s) for uncovered cases
def test_generate_report_empty_data():
"""Edge case: empty input should raise or return empty report."""
with pytest.raises(ValueError, match="Data cannot be empty"):
generate_report([])
def test_generate_report_malformed_data():
"""Edge case: malformed input should be handled gracefully."""
result = generate_report({"invalid": "structure"})
assert result["error"] == "Invalid data format"
For broken test setup:
Fix the setup, don't weaken the test to work around broken setup.
2.5 Verify Fix
After each fix:
# Run ONLY the fixed test first
pytest path/to/test.py::test_function -v
# If it passes, run the whole file to check for side effects
pytest path/to/test.py -v
Verification checklist:
- The specific test passes
- Other tests in the file still pass
- The fix would actually catch the failure it's supposed to catch
2.6 Commit Fix (if commit_strategy == "per_fix")
git add path/to/test.py
git commit -m "fix(tests): strengthen assertions in test_function
- [Describe what was weak/broken]
- [Describe what the fix does]
- Pattern: [N] - [Pattern name] (if from audit)
"
Phase 3: Batch Processing
3.1 Process by Priority
FOR priority IN [critical, important, minor]:
FOR item IN work_items WHERE item.priority == priority:
Execute Phase 2 for item
IF item failed to fix after 2 attempts:
Add to stuck_items[]
Continue to next item
3.2 Handle Stuck Items
If any items couldn't be fixed:
## Stuck Items
The following items could not be fixed automatically:
### [item.id]: [test_function]
**Attempted:** [what was tried]
**Blocked by:** [why it didn't work]
**Recommendation:** [manual intervention needed / more context needed / etc.]
Phase 4: Final Verification
4.1 Run Full Test Suite
pytest -v # or appropriate test command
4.2 Report Results
## Fix Tests Summary
### Input Mode
[audit_report / general_instructions / run_and_fix]
### Work Items Processed
- Total: N
- Fixed: X
- Stuck: Y
- Skipped (production bugs): Z
### Fixes Applied
| Test | File | Issue | Fix | Commit |
|------|------|-------|-----|--------|
| test_foo | test_auth.py | Pattern 2 (Partial Assertion) | Strengthened to full object match | abc123 |
| test_bar | test_api.py | Missing edge case | Added empty input test | def456 |
| ... | ... | ... | ... | ... |
### Test Suite Status
- Before: X passing, Y failing
- After: X passing, Y failing
### Stuck Items (if any)
[List with recommendations]
### Production Bugs Found (if any)
[List with recommended actions]
4.3 Optional: Re-run Green Mirage Audit
If input was from green-mirage-audit, offer to re-audit:
Fixes complete. Would you like me to re-run green-mirage-audit to verify no new mirages were introduced?
A) Yes, run audit on fixed files
B) No, I'm satisfied with the fixes
Handling Special Cases
Case: Flaky Tests
Indicators:
- Test passes sometimes, fails sometimes
- "Flaky" in test name or skip reason
- Timing-dependent assertions
Fix approach:
- Identify source of non-determinism (time, random, ordering, external state)
- Mock or control the non-deterministic element
- If truly timing-dependent, use appropriate waits/retries WITH assertions
# BAD: Flaky timing
def test_async_operation():
start_operation()
time.sleep(1) # Hope it's done!
assert get_result() is not None
# GOOD: Deterministic waiting
def test_async_operation():
start_operation()
result = wait_for_result(timeout=5) # Polls with timeout
assert result == expected_value
Case: Tests That Test Implementation Details
Indicators:
- Mocking internal methods
- Asserting on private state
- Breaking when refactoring without behavior change
Fix approach:
- Identify what BEHAVIOR the test should verify
- Rewrite to test behavior through public interface
- Remove implementation coupling
# BAD: Tests implementation
def test_user_save():
user = User(name="test")
user.save()
assert user._db_connection.execute.called_with("INSERT...")
# GOOD: Tests behavior
def test_user_save():
user = User(name="test")
user.save()
# Verify through public interface
loaded = User.find_by_name("test")
assert loaded is not None
assert loaded.name == "test"
Case: Missing Tests Entirely
If work item is "add tests for X" (no existing test to fix):
- Read the production code
- Identify key behaviors to test
- Write tests following existing patterns in the codebase
- Ensure tests would catch real failures (not green mirages)
Integration with Green Mirage Audit
Expected Audit Report Format
Green-mirage-audit outputs a YAML block at the start of its findings report. This skill parses that YAML directly for efficient processing.
YAML Block Structure
---
audit_metadata:
timestamp: "2024-01-15T10:30:00Z"
test_files_audited: 5
test_functions_audited: 47
summary:
total_tests: 47
solid: 31
green_mirage: 12
partial: 4
findings:
- id: "finding-1"
priority: critical
test_file: "tests/test_auth.py"
test_function: "test_login_success"
line_number: 45
pattern: 2
pattern_name: "Partial Assertions"
effort: trivial
depends_on: []
blind_spot: "Login could return malformed user object"
production_impact: "Broken user sessions"
- id: "finding-2"
priority: critical
test_file: "tests/test_auth.py"
test_function: "test_logout"
line_number: 78
pattern: 7
pattern_name: "State Mutation Without Verification"
effort: moderate
depends_on: ["finding-1"]
blind_spot: "Session not actually cleared"
production_impact: "Session persistence after logout"
remediation_plan:
phases:
- phase: 1
name: "Foundation fixes"
findings: ["finding-1"]
rationale: "Other tests depend on auth fixtures"
- phase: 2
name: "Auth suite completion"
findings: ["finding-2"]
rationale: "Depends on finding-1 fixtures"
total_effort_estimate: "2-3 hours"
recommended_approach: "sequential"
---
Parsing Logic
function parseGreenMirageReport(input: string): AuditReport {
// 1. Extract YAML block between --- markers
const yamlMatch = input.match(/^---\n([\s\S]*?)\n---/m);
if (!yamlMatch) {
// Fallback to legacy markdown parsing
return parseLegacyMarkdownFormat(input);
}
// 2. Parse YAML
const report = parseYAML(yamlMatch[1]);
// 3. Build work items from findings
const workItems = report.findings.map(f => ({
id: f.id,
priority: f.priority,
test_file: f.test_file,
test_function: f.test_function,
line_number: f.line_number,
pattern: f.pattern,
pattern_name: f.pattern_name,
effort: f.effort,
depends_on: f.depends_on,
blind_spot: f.blind_spot,
production_impact: f.production_impact,
// Will be populated from human-readable section
current_code: null,
suggested_fix: null
}));
// 4. Extract code blocks from human-readable findings
for (const item of workItems) {
const findingSection = extractFindingSection(input, item.id);
item.current_code = extractCodeBlock(findingSection, "Current Code");
item.suggested_fix = extractCodeBlock(findingSection, "Consumption Fix");
}
// 5. Use remediation_plan.phases for execution order
return {
metadata: report.audit_metadata,
summary: report.summary,
workItems,
phases: report.remediation_plan.phases,
totalEffort: report.remediation_plan.total_effort_estimate,
approach: report.remediation_plan.recommended_approach
};
}
Execution Order
This skill respects the remediation_plan from the audit:
- Process phases in order: Phase 1 before Phase 2, etc.
- Within each phase: Process findings in the order listed
- Honor dependencies: If
depends_onis non-empty, verify those are fixed first - Batch by file: When multiple findings are in same file, process together
Legacy Markdown Fallback
If no YAML block is found, fall back to parsing the human-readable format:
- Split findings by
**Finding #N:**headers - Extract priority from section header (Critical/Important/Minor)
- Parse file path and line number from
**File:**line - Extract pattern number and name from
**Pattern:**line - Extract code blocks for current_code and suggested_fix
- Extract blind_spot from
**Blind Spot:**section - Default effort to "moderate", depends_on to []
Over-Engineering
- Creating elaborate test infrastructure for simple fixes
- Adding abstraction layers "for future flexibility"
- Refactoring unrelated code while fixing tests
Under-Testing
- Weakening assertions to make tests pass
- Removing tests instead of fixing them
- Marking tests as skip without fixing
Scope Creep
- Fixing production bugs without flagging them
- Refactoring production code to make tests easier
- Adding features while fixing tests
Blind Fixes
- Applying suggested fixes without reading context
- Copy-pasting fixes without understanding them
- Not verifying fixes actually catch failures
Verify:
- All work items were processed or explicitly marked stuck
- Each fix was verified to pass
- Each fix was verified to catch the failure it should catch
- Full test suite was run at the end
- Any production bugs found were flagged (not silently "fixed")
- Commits follow the agreed strategy
- Summary report was provided
If NO to ANY item, go back and complete it.
Work fast, work precisely, verify everything. Don't over-engineer. Don't under-test.
Fix it, prove it works, move on.