| name | debugging |
| description | Systematic analysis for debugging. Use when encountering errors, bugs, or unexpected behaviors. |
Systematic Debugging
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
The Four Phases
Complete each phase before proceeding to the next.
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
Read Error Messages Carefully
- Read stack traces completely
- Note line numbers, file paths, error codes
- They often contain the exact solution
Reproduce Consistently
- Can you trigger it reliably?
- If not reproducible → gather more data, don't guess
Check Recent Changes
- Git diff, recent commits
- New dependencies, config changes
Gather Evidence in Multi-Component Systems
- Log what data enters/exits each component boundary
- Run once to gather evidence showing WHERE it breaks
- THEN investigate that specific component
Trace Data Flow
- Where does bad value originate?
- Keep tracing up until you find the source
- Fix at source, not at symptom
Phase 2: Pattern Analysis
- Find Working Examples - Locate similar working code in same codebase
- Compare Against References - Read reference implementation COMPLETELY, don't skim
- Identify Differences - List every difference, however small
- Understand Dependencies - What settings, config, environment does it need?
Phase 3: Hypothesis and Testing
Form Single Hypothesis
- State clearly: "I think X is the root cause because Y"
- Be specific, not vague
Test Minimally
- Make the SMALLEST possible change to test hypothesis
- One variable at a time
Verify Before Continuing
- Did it work? Yes → Phase 4
- Didn't work? Form NEW hypothesis
- DON'T add more fixes on top
Phase 4: Implementation
Create Failing Test Case
- Simplest possible reproduction
- MUST have before fixing
Implement Single Fix
- Address the root cause identified
- ONE change at a time
- No "while I'm here" improvements
Verify Fix
- Test passes now?
- No other tests broken?
If Fix Doesn't Work
- If < 3 attempts: Return to Phase 1, re-analyze
- If ≥ 3 attempts: STOP and question the architecture
If 3+ Fixes Failed: Question Architecture
- Is this pattern fundamentally sound?
- Should we refactor architecture vs. continue fixing symptoms?
- Discuss with your human partner before attempting more fixes
Red Flags - STOP and Return to Phase 1
If you catch yourself:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "It's probably X, let me fix that"
- "I don't fully understand but this might work"
- Proposing solutions before tracing data flow
- "One more fix attempt" (when already tried 2+)
ALL of these mean: STOP. Return to Phase 1.
Common Rationalizations
| Excuse | Reality |
|---|---|
| "Issue is simple" | Simple issues have root causes too |
| "Emergency, no time" | Systematic is FASTER than thrashing |
| "I'll write test after" | Untested fixes don't stick |
| "Multiple fixes saves time" | Can't isolate what worked |
Output Format
When using this skill, structure your response as:
## Debugging: [Brief issue description]
### Phase 1: Root Cause Investigation
- Error message: [key details]
→ What does the error actually say? Read the full stack trace.
- Reproduction: [steps or "not yet reproducible"]
→ Can you trigger it reliably? What are the exact steps?
- Recent changes: [relevant changes]
→ What changed in git? New dependencies? Config changes?
- Evidence gathered: [what you found]
→ Where exactly does the data flow break?
### Phase 2: Pattern Analysis
- Working example: [similar code that works]
→ Is there similar code in the codebase that works correctly?
- Key differences: [what's different]
→ What differs between working and broken code?
### Phase 3: Hypothesis
"I believe [X] is the root cause because [Y]"
→ Be specific. Vague hypotheses lead to vague fixes.
Minimal test: [smallest change to verify]
→ What's the ONE thing you can change to test this hypothesis?
### Phase 4: Implementation
- Test case: [the failing test]
→ Does a test exist that reproduces this bug?
- Fix: [the actual fix]
→ ONE change addressing the root cause.
- Verification: [test results]
→ Does the test pass? Are other tests still green?
Example: Real Debugging Session
Bug: "API returns 500 error when creating a user with special characters in name"
Phase 1: Root Cause Investigation
- Error message:
500 Internal Server Errorwith stack trace pointing toUserService.create()line 47:TypeError: Cannot read property 'normalize' of undefined - Reproduction: POST
/api/userswith{ "name": "José García" }→ 500 error. Works with{ "name": "John Smith" }. - Recent changes: Commit
a1b2c3dadded Unicode normalization for names 2 days ago. - Evidence: The
normalizemethod is called onconfig.unicodeForm, butconfigis undefined when the feature flagUNICODE_SUPPORTis disabled.
Phase 2: Pattern Analysis
- Working example:
ProductService.create()also uses Unicode normalization but checks if config exists first. - Key differences: ProductService has
if (config?.unicodeForm)guard. UserService assumes config is always present.
Phase 3: Hypothesis
"I believe the root cause is that UserService.create() doesn't guard against undefined config when UNICODE_SUPPORT feature flag is disabled, because the config object is only initialized when the flag is enabled."
Minimal test: Add optional chaining config?.unicodeForm at line 47.
Phase 4: Implementation
- Test case:
test_creates_user_with_special_chars_when_unicode_disabled:
disableFeatureFlag('UNICODE_SUPPORT')
result = userService.create({ name: 'José García' })
assert(result.name == 'José García')
- Fix: Changed
config.unicodeFormtoconfig?.unicodeForm ?? 'NFC' - Verification: New test passes. All 47 existing tests still green.
Quick Reference
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, gather evidence | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare | Identify differences |
| 3. Hypothesis | Form theory, test minimally | Confirmed or new hypothesis |
| 4. Implementation | Create test, fix, verify | Bug resolved, tests pass |
Real-World Impact
- Systematic approach: 15-30 minutes to fix
- Random fixes approach: 2-3 hours of thrashing
- First-time fix rate: 95% vs 40%
- New bugs introduced: Near zero vs common