name	debug-systematic
description	Systematic 4-phase debugging methodology for complex, intermittent, or mysterious issues. Use when investigating bugs, race conditions, or unexplained failures.

Systematic Debugging Protocol

A disciplined, evidence-based approach to debugging that prevents guessing and ensures root cause discovery.

The 4-Phase Protocol

Phase 1: REPRODUCE (Establish Ground Truth)

Goal: Create reliable reproduction steps before ANY investigation.

Actions:

Document exact steps to trigger the bug
Record environment specifics (OS, versions, config, memory, network)
Determine frequency: Always? Sometimes? Specific conditions?
Capture exact error messages, stack traces, screenshots
Test on different environments to isolate variables

Key Questions:

When did it last work correctly?
What changed since then? (code, deps, config, infrastructure)
Is it environment-specific?
Is it data-specific?
Is it timing-specific?

Output: Clear reproduction steps that reliably trigger the issue.

Phase 2: ISOLATE (Narrow the Scope)

Goal: Reduce the search space from "entire codebase" to "specific component."

Techniques:

Binary Search:

Identify two points: working state and broken state
Test the midpoint
Recurse into the broken half
Continue until the change is identified

Git Bisect (for regressions):

git bisect start
git bisect bad HEAD
git bisect good <known-good-commit>
# Git will checkout commits for testing
# After each test:
git bisect good  # or git bisect bad
# Continue until culprit found

Code Elimination:

Comment out sections to isolate the problem
Create minimal reproduction case
Strip away everything non-essential

Environment Isolation:

Test in isolation (unit test the failing path)
Compare working vs broken environments
Use fresh installs to eliminate pollution

Output: "The bug is in [specific component/function/line range]"

Phase 3: DIAGNOSE (Understand Root Cause)

Goal: Know exactly WHY the bug occurs, not just WHERE.

Scientific Method:

Observe: What exactly is happening?
Hypothesize: Why might this be happening?
Predict: If hypothesis is correct, what else would be true?
Test: Verify predictions with evidence
Iterate: Refine hypothesis based on results

Logging Strategy:

// Add strategic logging at boundaries
console.log('[DEBUG] Function entry:', { input, state });
console.log('[DEBUG] After processing:', { result, sideEffects });
console.log('[DEBUG] Function exit:', { returnValue });

Common Root Causes:

Symptom	Likely Causes
Works locally, fails in CI	Environment differences, timing, resources
Intermittent failure	Race condition, flaky network, resource contention
Works then stops working	State mutation, memory leak, cache poisoning
Wrong data	Type coercion, encoding, timezone, precision
Silent failure	Swallowed exception, async error, missing await

Output: Clear explanation of the root cause with evidence.

Phase 4: FIX & VERIFY (Resolve and Prevent)

Goal: Fix the issue and prevent regression.

Fix Process:

Write a failing test that captures the bug
Implement minimal fix - change as little as possible
Verify test passes - confirms fix works
Check for similar patterns - same bug elsewhere?
Review fix for side effects - does it break anything?
Document the fix - why it happened, how to prevent

Verification Checklist:

Test passes that specifically catches this bug
Existing tests still pass
Manual verification confirms fix
Fix works in all affected environments
No new warnings or errors introduced

Prevention:

Add guards/validation at boundaries
Improve error messages for easier future debugging
Document gotchas for other developers
Consider if architectural change prevents similar bugs

Debugging Anti-Patterns

DO NOT:

Guess and hope (change things randomly)
Assume you know the problem without evidence
Trust comments/docs over actual code behavior
Debug production with print statements you'll forget to remove
Fix the symptom instead of the root cause
Make multiple changes at once

DO:

Verify assumptions with evidence
Change one thing at a time
Log actual values, not what you expect
Trust the code over documentation
Take breaks when stuck (fresh eyes help)

Quick Reference

1. REPRODUCE → Can I reliably trigger this?
2. ISOLATE   → Where exactly is it failing?
3. DIAGNOSE  → Why is it failing?
4. FIX       → How do I fix it permanently?

Output Template

## Bug Investigation: [Title]

### Reproduction
- Steps to reproduce
- Environment details
- Frequency

### Isolation
- Search method used
- Scope narrowed to

### Root Cause
- What's actually wrong
- Why it happens
- Evidence

### Fix
- Code changes made
- Test added

### Prevention
- How to prevent similar bugs
- Documentation updates

debug-systematic

Install Skill

SKILL.md