name	debugging
description	Four-phase debugging framework that ensures root cause identification before fixes. Use when encountering bugs, test failures, unexpected behavior, or when previous fix attempts failed. Enforces investigate-first discipline ('debug this', 'fix this error', 'test is failing', 'not working').
allowed-tools	*

Systematic Debugger

Find root cause before fixing. Symptom fixes are failure.

Iron Law: NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

When to Use

Answer IN ORDER. Stop at first match:

Bug, error, or test failure? → Use this skill
Unexpected behavior? → Use this skill
Previous fix didn't work? → Use this skill (especially important)
Performance problem? → Use this skill
None of above? → Skip this skill

Use especially when:

Under time pressure (emergencies make guessing tempting)
"Quick fix" seems obvious (red flag)
Already tried 1+ fixes that didn't work

The Four Phases

Complete each phase before proceeding.

Phase 1: Root Cause Investigation

BEFORE attempting ANY fix:

1. Read Error Messages Completely

Don't skip past errors. They often contain the exact solution.
- Full stack trace (note line numbers, file paths)
- Error codes and messages
- Warnings that preceded the error

2. Reproduce Consistently

Can reproduce?	Action
Yes, every time	Proceed to step 3
Sometimes	Gather more data - when does it happen vs not?
Never	Cannot debug what you cannot reproduce - gather logs

3. Check Recent Changes

git diff HEAD~5          # Recent code changes
git log --oneline -10    # Recent commits

What changed that could cause this? Dependencies? Config? Environment?

4. Trace Data Flow (Root Cause Tracing)

When error is deep in call stack:

Symptom: Error at line 50 in utils.js
    ↑ Called by handler.js:120
    ↑ Called by router.js:45
    ↑ Called by app.js:10 ← ROOT CAUSE: bad input here

Technique:

Find where error occurs (symptom)
Ask: "What called this with bad data?"
Trace up until you find the SOURCE
Fix at source, not at symptom

5. Multi-Component Systems

When system has multiple layers (API → service → database):

# Log at EACH boundary before proposing fixes
echo "=== Layer 1 (API): request=$REQUEST ==="
echo "=== Layer 2 (Service): input=$INPUT ==="
echo "=== Layer 3 (DB): query=$QUERY ==="

Run once to find WHERE it breaks. Then investigate that layer.

Phase 2: Pattern Analysis

1. Find Working Examples

Locate similar working code in same codebase. What works that's similar?

2. Identify Differences

Working code	Broken code	Could this matter?
Uses async/await	Uses callbacks	Yes - timing
Validates input	No validation	Yes - bad data

List ALL differences. Don't assume "that can't matter."

Phase 3: Hypothesis Testing

1. Form Single Hypothesis

Write it down: "I think X is the root cause because Y"

Be specific:

❌ "Something's wrong with the database"
✅ "Connection pool exhausted because connections aren't released in error path"

2. Test Minimally

Rule	Why
ONE change at a time	Isolate what works
Smallest possible change	Avoid side effects
Don't bundle fixes	Can't tell what helped

3. Evaluate Result

Result	Action
Fixed	Phase 4 (verify)
Not fixed	NEW hypothesis (return to 3.1)
Partially fixed	Found one issue, continue investigating

Phase 4: Implementation

1. Create Failing Test

Before fixing, write test that fails due to the bug:

it('handles empty input without crashing', () => {
  // This test should FAIL before fix, PASS after
  expect(() => processData('')).not.toThrow();
});

2. Implement Fix

Address ROOT CAUSE identified in Phase 1
ONE change
No "while I'm here" improvements

3. Verify

New test passes
Existing tests still pass
Issue actually resolved (not just test passing)

4. If Fix Doesn't Work

Fix attempts	Action
1-2	Return to Phase 1 with new information
3+	STOP - Question architecture (see below)

5. After 3+ Failed Fixes: Question Architecture

Pattern indicating architectural problem:

Each fix reveals new coupling/shared state
Fixes require "massive refactoring"
Each fix creates new symptoms elsewhere

STOP and ask:

Is this pattern fundamentally sound?
Should we refactor vs. continue patching?
Discuss with user before more fix attempts

Red Flags - STOP Immediately

If you catch yourself thinking:

Thought	Reality
"Quick fix for now, investigate later"	Investigate NOW or you never will
"Just try changing X"	That's guessing, not debugging
"I'll add multiple fixes and test"	Can't isolate what worked
"I don't fully understand but this might work"	You need to understand first
"One more fix attempt" (after 2+ failures)	3+ failures = wrong approach

ALL mean: STOP. Return to Phase 1.

Quick Reference

Phase	Key Question	Success Criteria
1. Root Cause	"WHY is this happening?"	Understand cause, not just symptom
2. Pattern	"What's different from working code?"	Identified key differences
3. Hypothesis	"Is my theory correct?"	Confirmed or formed new theory
4. Implementation	"Does the fix work?"	Test passes, issue resolved

debugging

Install Skill

SKILL.md