| name | Five Whys Root Cause Analysis |
| description | This skill should be used when the user reports "error", "bug", "problem", needs to "fix" something, mentions "debug", "failing", "broken", "issue", "not working", "crash", or any problem-solving scenario. Enforces systematic root cause identification using five whys methodology before attempting solutions. |
| version | 0.1.0 |
Five Whys Root Cause Analysis
Purpose
This skill enforces systematic root cause identification before solution attempts. Apply the five whys methodology to prevent speculation-driven debugging that wastes effort, creates harmful changes, and results in never-ending fix loops.
Core Problem
Claude Code's default behavior when encountering problems:
- Observe symptom
- Speculate about cause
- Implement speculative fix
- Fix doesn't work or creates new problems
- Repeat indefinitely
Result: Wasted effort, potentially harmful changes, user frustration.
Five Whys Methodology
The Process
Start with the observed symptom and iteratively ask "why?" until reaching an actionable root cause.
Example progression:
- Symptom: Login returns 500 error
- Why #1: Why does login return 500? → Database connection timeout
- Why #2: Why does database timeout? → Connection pool exhausted
- Why #3: Why is pool exhausted? → Connections not being released
- Why #4: Why aren't connections released? → Missing connection.release() call in session handler
Root cause identified: Missing connection.release() call - this is actionable.
Iteration Guidelines
Continue asking "why" until:
- The cause is actionable (can be fixed with code/config changes)
- The cause directly explains the observed symptom
- Next "why" would be outside our control (external system, architectural decision)
Don't stop at:
- Symptom restatements ("it's broken because it doesn't work")
- Vague causes ("there's a configuration issue")
- First plausible explanation without verification
- Standard iteration counts (five is just a name - continue until root cause found)
Evidence-Based Investigation
For each "why", gather evidence:
Read relevant code:
- Files mentioned in error messages
- Call stack locations
- Related modules and dependencies
Examine outputs:
- Error messages and stack traces
- Log files
- Console output
- Test results
Check configurations:
- Environment variables
- Config files
- Build settings
- Dependency versions
Verify assumptions:
- Don't assume - check actual behavior
- Read the code that's actually running
- Check what values are actually present
Base each "why" answer on concrete evidence, not speculation.
Root Cause Determination
Criteria for Root Cause
A root cause is identified when ALL these conditions are met:
Actionable:
- Can be fixed with code, configuration, or environment changes
- Within our control to modify
- Clear fix approach exists
Directly Explanatory:
- Cause fully explains the observed symptom
- No gaps in the causal chain
- Removing this cause would eliminate the symptom
Terminal:
- Next "why" would go outside our control (system architecture, external dependencies, business requirements)
- OR: Next "why" would not lead to a more actionable fix
- OR: This is the deepest layer we can address
Example: Root Cause vs Symptom
Symptom: "Tests are failing"
- ❌ NOT root cause: "The test expectations are wrong" (what made them wrong?)
- ❌ NOT root cause: "The code doesn't match tests" (why doesn't it match?)
- ✅ ROOT CAUSE: "Function returns null when user.profile is undefined, but tests expect empty object"
Symptom: "API returns 404"
- ❌ NOT root cause: "Route doesn't exist" (why doesn't it exist?)
- ❌ NOT root cause: "Router not configured" (what's the actual misconfiguration?)
- ✅ ROOT CAUSE: "Route path is '/api/v1/users' but code defines '/api/users' - missing '/v1' prefix"
When to Stop
Stop investigating when:
- Identified an actionable fix with clear implementation
- Cause is a concrete code/config/data issue
- Evidence confirms this cause produces the symptom
Continue investigating when:
- Cause is vague or speculative
- No clear fix path exists
- Haven't verified the cause with evidence
- Fixing this wouldn't fully resolve symptom
Anti-Patterns to Avoid
Speculation Without Evidence
❌ Bad:
Why is login failing?
→ "Probably a timeout issue" [no evidence]
→ "Let me increase the timeout" [speculative fix]
✅ Good:
Why is login failing?
→ [Read error logs] → Database connection timeout after 5 seconds
→ [Read database config] → Connection pool size is 5
→ [Read application metrics] → 50 concurrent requests
→ Root cause: Connection pool too small for request volume
Stopping Too Early
❌ Bad:
Why are tests failing?
→ "Tests are outdated"
→ [Starts rewriting tests without understanding why they're outdated]
✅ Good:
Why are tests failing?
→ Tests expect user.email, code returns user.emailAddress
Why the field name difference?
→ API v2 migration changed field names
Why weren't tests updated?
→ Tests live in separate repo, not updated in migration PR
Root cause: Migration process doesn't include cross-repo test updates
Treating Symptoms
❌ Bad:
Error: "Cannot read property 'name' of undefined"
→ "Let me add null check" [treats symptom]
✅ Good:
Why is object undefined?
→ API call returns undefined when user not found
Why is API call made for non-existent user?
→ User ID comes from URL parameter without validation
Why no validation?
→ Route handler assumes ID is always valid
Root cause: Missing user ID validation in route handler
Progressive Output Format
Show each investigation level as it completes:
## Root Cause Analysis
### Investigation Level 1
**Observation:** [What was observed]
**Evidence:** [What was checked - files, logs, outputs]
**Finding:** [What this reveals]
**Next Question:** Why [finding]?
### Investigation Level 2
**Observation:** [Deeper observation]
**Evidence:** [More specific evidence]
**Finding:** [What this reveals]
**Next Question:** Why [finding]?
[Continue until root cause identified]
### Root Cause Identified
**Cause:** [Specific, actionable cause]
**Evidence:** [Concrete evidence supporting this]
**Fix Approach:** [How this can be addressed]
**Why This is Root:** [Meets actionability, explanatory, and terminal criteria]
Integration with Problem Solving
When Skill Activates
This skill triggers automatically when:
- User message contains problem keywords (error, bug, fix, debug, etc.)
- Tool output contains errors or exceptions
- User is clearly trying to solve a problem
Agent Invocation
When this skill triggers, it should invoke the root-cause-analyzer agent:
- Agent runs autonomously through five whys
- Shows progressive output for each iteration
- Determines when root cause is reached
- Completes automatically
User Override
User can bypass analysis by:
- Interrupting the agent manually
- Explicitly stating "skip root cause analysis"
- Requesting immediate fix attempts
But default behavior is always root cause first.
Workflow
- Detect problem: Skill triggers on keywords or errors
- Launch analysis: Invoke root-cause-analyzer agent
- Autonomous iteration: Agent works through whys without user prompts
- Progressive display: Show each iteration as investigation proceeds
- Root cause identified: Agent determines when criteria met
- Proceed to solution: Now fix the identified root cause
Success Criteria
Root cause analysis is complete when:
- ✅ Specific, actionable cause identified
- ✅ Evidence gathered confirms the cause
- ✅ Cause directly explains observed symptom
- ✅ Clear fix approach exists
- ✅ Fixing this cause will resolve the problem
Output statement: "Root cause identified: [specific cause]. This is actionable and directly explains [symptom]. Proceeding to solution..."