| name | systematic-debugging |
| description | Root cause analysis for debugging. Use when bugs, test failures, or unexpected behavior have non-obvious causes, or after multiple fix attempts have failed. |
Systematic Debugging
Core principle: Find root cause before attempting fixes. Symptom fixes are failure.
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
Read Error Messages Carefully
- Read stack traces completely
- Note line numbers, file paths, error codes
- Don't skip warnings
Reproduce Consistently
- What are the exact steps?
- If not reproducible → gather more data, don't guess
Check Recent Changes
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences
Gather Evidence in Multi-Component Systems
WHEN system has multiple components (CI → build → signing, API → service → database):
Add diagnostic instrumentation before proposing fixes:
For EACH component boundary: - Log what data enters/exits component - Verify environment/config propagation - Check state at each layer Run once to gather evidence → analyze → identify failing componentExample:
# Layer 1: Workflow echo "=== Secrets available: ===" echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}" # Layer 2: Build script env | grep IDENTITY || echo "IDENTITY not in environment" # Layer 3: Signing security find-identity -vTrace Data Flow
See
references/root-cause-tracing.mdfor backward tracing technique.Quick version: Where does bad value originate? Trace up call chain until you find the source. Fix at source.
Phase 2: Pattern Analysis
- Find Working Examples - Similar working code in codebase
- Compare Against References - Read reference implementations COMPLETELY, don't skim
- Identify Differences - List every difference, don't assume "that can't matter"
- Understand Dependencies - Components, config, environment, assumptions
Phase 3: Hypothesis and Testing
- Form Single Hypothesis - "I think X is root cause because Y" - be specific
- Test Minimally - SMALLEST possible change, one variable at a time
- Verify - Worked → Phase 4. Didn't work → form NEW hypothesis, don't stack fixes
- When You Don't Know - Say so. Don't pretend.
Phase 4: Implementation
Create Failing Test Case
- Use the
test-driven-developmentskill - MUST have before fixing
- Use the
Implement Single Fix
- ONE change at a time
- No "while I'm here" improvements
Verify Fix
- Test passes? Other tests still pass? Issue resolved?
If Fix Doesn't Work
- Count attempts
- If < 3: Return to Phase 1 with new information
- If ≥ 3: Escalate (below)
Escalation: 3+ Failed Fixes
Pattern indicating architectural problem:
- Each fix reveals new problems elsewhere
- Fixes require massive refactoring
- Shared state/coupling keeps surfacing
Action: STOP. Question fundamentals:
- Is this pattern fundamentally sound?
- Are we continuing through inertia?
- Refactor architecture vs. continue fixing symptoms?
Discuss with human partner before more fix attempts. This is wrong architecture, not failed hypothesis.
Red Flags → STOP and Return to Phase 1
If you catch yourself thinking:
- "Quick fix for now, investigate later"
- "Just try changing X"
- "I'll skip the test"
- "It's probably X"
- "Pattern says X but I'll adapt it differently"
- Proposing solutions before tracing data flow
- "One more fix" after 2+ failures
Human Signals You're Off Track
- "Is that not happening?" → You assumed without verifying
- "Will it show us...?" → You should have added evidence gathering
- "Stop guessing" → You're proposing fixes without understanding
- "Ultrathink this" → Question fundamentals
- Frustrated "We're stuck?" → Your approach isn't working
Response: Return to Phase 1.
Supporting Techniques
Reference files in references/:
root-cause-tracing.md- Trace bugs backward through call stackdefense-in-depth.md- Add validation at multiple layers after finding root causecondition-based-waiting.md- Replace arbitrary timeouts with condition polling
Related skills:
test-driven-development- Creating failing test case (Phase 4)verification-before-completion- Verify fix before claiming success