name	systematic-debugging
description	Four-phase debugging: root cause → patterns → hypothesis → implement. For complex bugs, test failures, multi-component issues. NOT for obvious syntax errors.
inputs	[object Object], [object Object]
outputs	[object Object], [object Object]
next_skills	pop-test-driven-development, pop-root-cause-tracing
workflow	[object Object]

Systematic Debugging

Random fixes waste time. Quick patches mask issues.

Core principle: ALWAYS find root cause before fixes. Symptom fixes are failure.

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION

When to Use

ANY technical issue: test failures, bugs, unexpected behavior, performance, builds, integration.

ESPECIALLY when:

Under time pressure
"Just one quick fix" seems obvious
Already tried multiple fixes
Previous fix didn't work
Don't fully understand issue

Don't skip when:

Seems simple (simple bugs have root causes)
You're hurrying (systematic is faster than thrashing)
Manager wants NOW (systematic prevents rework)

Four Phases

Phase 1: Root Cause Investigation

For test failures, check flakiness FIRST:

Test fails → Run 5x
├─ Passes 5/5: Not flaky, investigate as bug
├─ Fails 5/5: Consistent, investigate as bug
└─ Mixed (3/5): FLAKY TEST - fix test first

Flaky test checklist:

Check	How	Fix
Isolated/connected?	Run single vs suite	State pollution
Timing-dependent?	Look for timeouts/sleeps	Condition-based waiting
Environment-specific?	CI vs local	Mock env vars
Order-dependent?	Different order	Setup/teardown
Race condition?	Async without waits	Proper async/await

Then continue:

Read Errors Carefully - Stack traces, line numbers, error codes
Reproduce Consistently - Exact steps, happens every time?
Check Recent Changes - Git diff, dependencies, config
Multi-Component Systems - Add diagnostic instrumentation at boundaries BEFORE proposing fixes
Trace Data Flow - Where does bad value originate? (See pop-root-cause-tracing)

Phase 2: Pattern Analysis

Find Working Examples - Similar code that works
Compare References - Read reference implementations COMPLETELY
Identify Differences - List ALL differences
Understand Dependencies - Config, environment, assumptions

Phase 3: Hypothesis & Testing

Form Single Hypothesis - "I think X causes Y because Z"
Test Minimally - Smallest change, one variable
Verify - Worked? → Phase 4. Didn't? → New hypothesis
When Unknown - Say "I don't understand X", ask for help

Phase 4: Implementation

Create Failing Test - Use test-driven-development skill
Implement Single Fix - Address root cause, ONE change
Verify Fix - Test passes, no other tests broken
If Fix Doesn't Work
- STOP. Count fixes tried.
- If < 3: Return to Phase 1 with new info
- If >= 3: STOP. Question architecture (see below)
If 3+ Fixes Failed: Question Architecture
- Each fix reveals new problems elsewhere
- Fixes require "massive refactoring"
- Pattern fundamentally unsound?
- Discuss with user before more fixes

Red Flags

STOP if thinking:

"Quick fix for now"
"Just try X and see"
"Add multiple changes"
"Skip test, manually verify"
"It's probably X"
"Don't fully understand but..."
"One more fix" (after 2+)

ALL → Return to Phase 1

3+ failures → Question architecture

Quick Reference

Phase	Key Activities	Success
1. Root Cause	Read errors, reproduce, gather evidence	Understand WHAT & WHY
2. Pattern	Find working examples, compare	Identify differences
3. Hypothesis	Form theory, test minimally	Confirmed or new
4. Implement	Test, fix, verify	Resolved, tests pass

Real-World Impact

Systematic: 15-30min to fix, 95% first-time success, near-zero new bugs
Random: 2-3h thrashing, 40% success, common new bugs

Cross-References

Flaky tests: pop-test-driven-development (Condition-Based Waiting)
Root cause tracing: pop-root-cause-tracing (backward tracing)
Defense: pop-defense-in-depth (multi-layer validation)

Examples

See examples/ for:

flaky-test-patterns.md - Common flaky test causes & fixes
debugging-flowchart.pdf - Visual decision tree
multi-component-diagnostic.md - Instrumentation strategy

systematic-debugging

Install Skill

SKILL.md