| name | debugging |
| description | Four-phase debugging framework that ensures root cause identification before fixes. Use when encountering bugs, test failures, unexpected behavior, or when previous fix attempts failed. Enforces investigate-first discipline ('debug this', 'fix this error', 'test is failing', 'not working'). |
| allowed-tools | * |
Systematic Debugger
Find root cause before fixing. Symptom fixes are failure.
Iron Law: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
When to Use
Answer IN ORDER. Stop at first match:
- Bug, error, or test failure? → Use this skill
- Unexpected behavior? → Use this skill
- Previous fix didn't work? → Use this skill (especially important)
- Performance problem? → Use this skill
- None of above? → Skip this skill
Use especially when:
- Under time pressure (emergencies make guessing tempting)
- "Quick fix" seems obvious (red flag)
- Already tried 1+ fixes that didn't work
The Four Phases
Complete each phase before proceeding.
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
1. Read Error Messages Completely
Don't skip past errors. They often contain the exact solution.
- Full stack trace (note line numbers, file paths)
- Error codes and messages
- Warnings that preceded the error
2. Reproduce Consistently
| Can reproduce? | Action |
|---|---|
| Yes, every time | Proceed to step 3 |
| Sometimes | Gather more data - when does it happen vs not? |
| Never | Cannot debug what you cannot reproduce - gather logs |
3. Check Recent Changes
git diff HEAD~5 # Recent code changes
git log --oneline -10 # Recent commits
What changed that could cause this? Dependencies? Config? Environment?
4. Trace Data Flow (Root Cause Tracing)
When error is deep in call stack:
Symptom: Error at line 50 in utils.js
↑ Called by handler.js:120
↑ Called by router.js:45
↑ Called by app.js:10 ← ROOT CAUSE: bad input here
Technique:
- Find where error occurs (symptom)
- Ask: "What called this with bad data?"
- Trace up until you find the SOURCE
- Fix at source, not at symptom
5. Multi-Component Systems
When system has multiple layers (API → service → database):
# Log at EACH boundary before proposing fixes
echo "=== Layer 1 (API): request=$REQUEST ==="
echo "=== Layer 2 (Service): input=$INPUT ==="
echo "=== Layer 3 (DB): query=$QUERY ==="
Run once to find WHERE it breaks. Then investigate that layer.
Phase 2: Pattern Analysis
1. Find Working Examples
Locate similar working code in same codebase. What works that's similar?
2. Identify Differences
| Working code | Broken code | Could this matter? |
|---|---|---|
| Uses async/await | Uses callbacks | Yes - timing |
| Validates input | No validation | Yes - bad data |
List ALL differences. Don't assume "that can't matter."
Phase 3: Hypothesis Testing
1. Form Single Hypothesis
Write it down: "I think X is the root cause because Y"
Be specific:
- ❌ "Something's wrong with the database"
- ✅ "Connection pool exhausted because connections aren't released in error path"
2. Test Minimally
| Rule | Why |
|---|---|
| ONE change at a time | Isolate what works |
| Smallest possible change | Avoid side effects |
| Don't bundle fixes | Can't tell what helped |
3. Evaluate Result
| Result | Action |
|---|---|
| Fixed | Phase 4 (verify) |
| Not fixed | NEW hypothesis (return to 3.1) |
| Partially fixed | Found one issue, continue investigating |
Phase 4: Implementation
1. Create Failing Test
Before fixing, write test that fails due to the bug:
it('handles empty input without crashing', () => {
// This test should FAIL before fix, PASS after
expect(() => processData('')).not.toThrow();
});
2. Implement Fix
- Address ROOT CAUSE identified in Phase 1
- ONE change
- No "while I'm here" improvements
3. Verify
- New test passes
- Existing tests still pass
- Issue actually resolved (not just test passing)
4. If Fix Doesn't Work
| Fix attempts | Action |
|---|---|
| 1-2 | Return to Phase 1 with new information |
| 3+ | STOP - Question architecture (see below) |
5. After 3+ Failed Fixes: Question Architecture
Pattern indicating architectural problem:
- Each fix reveals new coupling/shared state
- Fixes require "massive refactoring"
- Each fix creates new symptoms elsewhere
STOP and ask:
- Is this pattern fundamentally sound?
- Should we refactor vs. continue patching?
- Discuss with user before more fix attempts
Red Flags - STOP Immediately
If you catch yourself thinking:
| Thought | Reality |
|---|---|
| "Quick fix for now, investigate later" | Investigate NOW or you never will |
| "Just try changing X" | That's guessing, not debugging |
| "I'll add multiple fixes and test" | Can't isolate what worked |
| "I don't fully understand but this might work" | You need to understand first |
| "One more fix attempt" (after 2+ failures) | 3+ failures = wrong approach |
ALL mean: STOP. Return to Phase 1.
Quick Reference
| Phase | Key Question | Success Criteria |
|---|---|---|
| 1. Root Cause | "WHY is this happening?" | Understand cause, not just symptom |
| 2. Pattern | "What's different from working code?" | Identified key differences |
| 3. Hypothesis | "Is my theory correct?" | Confirmed or formed new theory |
| 4. Implementation | "Does the fix work?" | Test passes, issue resolved |