| name | systematic-debugging |
| description | Use when encountering any bug, test failure, or unexpected behavior - four-phase framework (root cause investigation, pattern analysis, hypothesis testing, implementation) that ensures understanding before attempting solutions |
Systematic Debugging
Overview
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
When to Use
Use for ANY technical issue:
- Test failures
- Bugs in production
- Unexpected behavior
- Performance problems
- Build failures
- Integration issues
Use this ESPECIALLY when:
- Under time pressure (emergencies make guessing tempting)
- "Just one quick fix" seems obvious
- You've already tried multiple fixes
- Previous fix didn't work
The Four Phases
Complete each phase before proceeding to the next.
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
Read Error Messages Carefully
- Don't skip past errors or warnings
- Read stack traces completely
- Note line numbers, file paths, error codes
Reproduce Consistently
- Can you trigger it reliably?
- What are the exact steps?
- If not reproducible → gather more data, don't guess
Check Recent Changes
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences
Gather Evidence in Multi-Component Systems
For EACH component boundary: - Log what data enters component - Log what data exits component - Verify environment/config propagation Run once to gather evidence showing WHERE it breaks THEN investigate that specific componentTrace Data Flow
- Where does bad value originate?
- What called this with bad value?
- Keep tracing up until you find the source
- Fix at source, not at symptom
SUB-SKILL: Use root-cause-tracing for backward tracing technique
Phase 2: Pattern Analysis
Find Working Examples
- Locate similar working code in same codebase
- What works that's similar to what's broken?
Compare Against References
- Read reference implementation COMPLETELY
- Don't skim - read every line
Identify Differences
- What's different between working and broken?
- List every difference, however small
Understand Dependencies
- What other components does this need?
- What settings, config, environment?
Phase 3: Hypothesis and Testing
Form Single Hypothesis
- State clearly: "I think X is the root cause because Y"
- Be specific, not vague
Test Minimally
- Make the SMALLEST possible change to test hypothesis
- One variable at a time
- Don't fix multiple things at once
Verify Before Continuing
- Did it work? Yes → Phase 4
- Didn't work? Form NEW hypothesis
- DON'T add more fixes on top
Phase 4: Implementation
Create Failing Test Case
- Simplest possible reproduction
- Automated test if possible
- SUB-SKILL: Use test-driven-development
Implement Single Fix
- Address the root cause identified
- ONE change at a time
- No "while I'm here" improvements
Verify Fix
- Test passes now?
- No other tests broken?
If 3+ Fixes Failed: Question Architecture
- Each fix reveals new problem in different place?
- Fixes require "massive refactoring"?
- STOP and discuss with user before attempting more fixes
Red Flags - STOP and Follow Process
If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X and see if it works"
- "I don't fully understand but this might work"
- Proposing solutions before tracing data flow
ALL of these mean: STOP. Return to Phase 1.
Quick Reference
| Phase | Key Activities | Success Criteria |
|---|---|---|
| 1. Root Cause | Read errors, reproduce, gather evidence | Understand WHAT and WHY |
| 2. Pattern | Find working examples, compare | Identify differences |
| 3. Hypothesis | Form theory, test minimally | Confirmed or rejected |
| 4. Implementation | Create test, fix, verify | Bug resolved, tests pass |
Integration
Required sub-skills:
- root-cause-tracing - When error is deep in call stack
- test-driven-development - For creating failing test case
Complementary skills:
- defense-in-depth - Add validation at multiple layers
- condition-based-waiting - Replace arbitrary timeouts
- verification-before-completion - Verify fix before claiming success