name	dev-debug
version	1
description	This skill should be used when the user asks to 'debug', 'fix bug', 'investigate error', 'why is it broken', 'trace root cause', 'find the bug', or needs systematic bug investigation and fixing with verification-driven methodology using ralph loops.

Announce: "I'm using dev-debug for systematic bug investigation."

Where This Fits

Main Chat (you)                    Task Agent
─────────────────────────────────────────────────────
dev-debug (this skill)
  → ralph loop (one per bug)
    → dev-delegate (spawn agents)
      → Task agent ──────────────→ investigates
                                   writes regression test
                                   implements fix

Main chat orchestrates. Task agents investigate and fix.

The Iron Law of Debugging
The Iron Law of Delegation
The Process
The Four Phases
If Max Iterations Reached

Systematic Debugging

## The Iron Law of Debugging

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. This is not negotiable.

Before writing ANY fix, you MUST:

Reproduce the bug (with a test)
Trace the data flow
Form a specific hypothesis
Test that hypothesis
Only THEN write a fix (with a regression test first!)

If you catch yourself about to write a fix without investigation, STOP.

## The Iron Law of Delegation

MAIN CHAT MUST NOT WRITE CODE. This is not negotiable.

Main chat orchestrates the ralph loop. Task agents do the work:

Investigation: Task agents read code, run tests, gather evidence
Fixes: Task agents write regression tests and fixes

Main Chat Does	Task Agents Do
Start ralph loop	Investigate root cause
Spawn Task agents	Run tests, read code
Review findings	Write regression tests
Verify fix	Implement fixes

If you're about to edit code directly, STOP and delegate instead.

The Process

Unlike implementation (per-task loops), debugging uses ONE loop per bug:

1. Start ralph loop for the bug
   Skill(skill="ralph-loop:ralph-loop", args="Debug: [SYMPTOM] --max-iterations 15 --completion-promise FIXED")

2. Inside loop: spawn Task agent for investigation/fix
   → Skill(skill="workflows:dev-delegate")

3. Task agent follows 4-phase debug protocol

4. When regression test passes → output promise
   <promise>FIXED</promise>

5. Bug fixed, loop ends

Step 1: Start Ralph Loop

IMPORTANT: Avoid parentheses () in the prompt.

Skill(skill="ralph-loop:ralph-loop", args="Debug: [SYMPTOM] --max-iterations 15 --completion-promise FIXED")

Step 2: Spawn Task Agent

Use dev-delegate, but with debug-specific instructions:

Task(subagent_type="general-purpose", prompt="""
Debug [SYMPTOM] following systematic protocol.

## Context
- Read .claude/LEARNINGS.md for prior hypotheses
- Read .claude/SPEC.md for expected behavior

## Debug Protocol (4 Phases)

### Phase 1: Investigate
- Add debug logging to suspected code path
- Reproduce the bug with a test
- Document: "Reproduced with [test], output: [error]"

### Phase 2: Analyze
- Trace data flow through the code
- Compare to working code paths
- Document findings in LEARNINGS.md

### Phase 3: Hypothesize
- Form ONE specific hypothesis
- Test it with minimal change
- If wrong: document what was ruled out
- If right: proceed to fix

### Phase 4: Fix
- Write regression test FIRST (must fail before fix)
- Implement minimal fix
- Run test, see it PASS
- Run full test suite

## Output
Report:
- Hypothesis tested
- Root cause (if found)
- Regression test written
- Fix applied (or blockers)
""")

Step 3: Verify and Complete

After Task agent returns, verify:

Regression test FAILS before fix
Regression test PASSES after fix
Root cause documented in LEARNINGS.md
All existing tests still pass

If ALL pass → output the promise:

<promise>FIXED</promise>

If ANY fail → iterate (don't output promise yet).

The Four Phases

Phase	Purpose	Output
Investigate	Reproduce, trace data flow	Bug reproduction
Analyze	Compare working vs broken	Findings documented
Hypothesize	ONE specific hypothesis	Hypothesis tested
Fix	Regression test → fix	Tests pass

The Gate Function

Before claiming ANY bug is fixed:

1. REPRODUCE → Run test, see bug manifest
2. INVESTIGATE → Trace data flow, form hypothesis
3. TEST → Verify hypothesis with minimal change
4. FIX → Write regression test FIRST (see it FAIL)
5. VERIFY → Run fix, see regression test PASS
6. CONFIRM → Run full test suite, no regressions
7. CLAIM → Only after steps 1-6

Skipping any step is guessing, not debugging.

Rationalization Prevention

These thoughts mean STOP—you're about to skip the protocol:

Thought	Reality
"I know exactly what this is"	Knowing ≠ verified. Investigate anyway.
"Let me just try this fix"	Guessing. Form hypothesis first.
"The fix is obvious"	Obvious fixes often mask deeper issues.
"I've seen this before"	This instance may be different. Verify.
"No need for regression test"	Every fix needs a regression test. Period.
"It works now"	"Works now" ≠ "fixed correctly". Run full suite.
"I'll add the test later"	You won't. Write it BEFORE the fix.
"Log checking proves fix works"	Logs prove code ran, not that output is correct. Verify actual results.
"It stopped failing"	Stopped failing ≠ fixed. Could be hiding the symptom. Need E2E.
"The error is gone"	No error ≠ correct behavior. Verify expected output.
"Regression test is too complex"	If too complex to test, too complex to know it's fixed.

Fake Fix Verification - STOP

These do NOT prove a bug is fixed:

❌ Fake Verification	✅ Real Verification
"Error message is gone"	"Regression test passes + output matches spec"
"Logs show correct path taken"	"E2E test verifies user-visible behavior"
"No exception thrown"	"Test asserts expected data returned"
"Process exits 0"	"Functional test confirms correct side effects"
"Changed one line, seems fine"	"Regression test failed before, passes after"
"Can't reproduce anymore"	"Regression test reproduces it, fix makes it pass"

Red Flag: If you're claiming "fixed" based on absence of errors rather than presence of correct behavior - STOP. That's symptom suppression, not bug fixing.

Red Flags - STOP If You Think:

Thought	Why It's Wrong	Do Instead
"Let's just try this fix"	You're guessing	Investigate first
"I'm pretty sure it's this"	"Pretty sure" ≠ root cause	Gather evidence
"This should work"	Hope is not debugging	Test your hypothesis
"Let me change a few things"	Multiple changes = can't learn	ONE hypothesis at a time

If Max Iterations Reached

Ralph exits after max iterations. Still do NOT ask user to manually verify.

Main chat should:

Summarize hypotheses tested (from LEARNINGS.md)
Report what was ruled out and what remains unclear
Ask user for direction:
- A) Start new loop with different investigation angle
- B) Add more logging to specific code path
- C) User provides additional context
- D) User explicitly requests manual verification

Never default to "please verify manually". Always exhaust automation first.

When Fix Requires Substantial Changes

If root cause reveals need for significant refactoring:

Document root cause in LEARNINGS.md
Complete debug loop with <promise>FIXED</promise> for the investigation
Use Skill(skill="workflows:dev") for the implementation work

Debug finds the problem. The dev workflow implements the solution.

Failure Recovery Protocol

Pattern from oh-my-opencode: After 3 consecutive failures, escalate.

3-Failure Trigger

If you attempt 3 hypotheses and ALL fail:

Failure 1: Hypothesis A tested → still broken
Failure 2: Hypothesis B tested → still broken
Failure 3: Hypothesis C tested → still broken
→ TRIGGER RECOVERY PROTOCOL

Recovery Steps

STOP all further debugging attempts
- No more "let me try one more thing"
- No guessing or throwing fixes at the wall
REVERT to last known working state
- git checkout <last-working-commit>
- Or revert specific files: git checkout HEAD~N -- file.ts
- Document what was attempted in .claude/RECOVERY.md
DOCUMENT what was attempted
- All 3 hypotheses tested
- Evidence gathered
- Why each failed
- What this rules out
CONSULT with user
- "I've tested 3 hypotheses. All failed. Here's what I've ruled out..."
- Present evidence from investigation
- Request: additional context, different investigation angle, or pair debugging
ASK USER before proceeding
- Option A: Start new ralph loop with different approach
- Option B: User provides domain knowledge/context
- Option C: Escalate to more experienced reviewer
- Option D: Accept this as a blocker and document

NO EVIDENCE = NOT FIXED (hard rule)

Recovery Checklist

Before claiming a bug is fixed after multiple failures:

At least 1 hypothesis succeeded (not just "stopped failing")
Regression test exists and PASSES
Full test suite passes (no new failures)
Changes are minimal and targeted
Root cause is understood (not just symptom suppressed)

Anti-Patterns After Failures

DON'T:

Keep trying random fixes ("maybe if I change this...")
Expand scope to "related" issues
Make multiple changes at once
Skip the regression test "this time"
Claim fix without evidence

DO:

Stop and document what failed
Revert to clean state
Consult before continuing
Follow recovery protocol exactly
Require evidence for completion

Example Recovery Flow

Attempt 1: "Bug is in parser" → Added logging → Still broken
Attempt 2: "Bug is in validator" → Fixed validation → Still broken
Attempt 3: "Bug is in transformer" → Rewrote transform → Still broken

→ RECOVERY PROTOCOL:
1. STOP (no attempt 4)
2. REVERT all changes: git checkout HEAD -- src/
3. DOCUMENT in .claude/RECOVERY.md:
   - Ruled out: parser, validator, transformer
   - Evidence: logs show data correct at each stage
   - Hypothesis: Bug might be in consumer, not producer
4. ASK USER:
   "I've ruled out the parser/validator/transformer chain.
    Logs show data is correct when it leaves our system.
    Next investigation angle: check the consumer.
    Should I:
    A) Start new loop investigating consumer
    B) Pause for your input on where else to look"

dev-debug

Install Skill

SKILL.md