| name | debugging |
| description | Use when investigating bugs, diagnosing issues, or understanding unexpected behavior. Provides systematic approaches to finding root causes. |
| allowed-tools | Read, Write, Edit, Bash, Grep, Glob |
Debugging Skill
Systematic approaches to investigating and diagnosing bugs.
Core Principle
Understand before fixing. A proper diagnosis leads to a proper fix.
The Scientific Method for Debugging
1. Observe
Gather all the facts:
- What's the symptom? (What's happening that shouldn't?)
- When does it happen? (Always, sometimes, specific conditions?)
- Who's affected? (All users, some users, specific scenarios?)
- Error messages? (Exact text, stack traces, error codes?)
- Recent changes? (What changed before this started?)
Evidence to collect:
- Error messages and stack traces
- Application logs
- User reports
- Reproduction steps
- Environment details (browser, OS, versions)
- Network requests/responses
- Database query logs
2. Form Hypothesis
Based on symptoms, what could cause this?
Common categories:
- Logic error: Code does wrong thing
- State management: State gets out of sync
- Async/timing: Race condition, callback hell
- Data issue: Unexpected input format
- Integration: API change, service down
- Environment: Config, permissions, network
- Resource: Memory leak, connection pool exhausted
Prioritize hypotheses:
- Most likely causes first
- Easiest to test first (when equal likelihood)
- Most impactful if true
3. Test Hypothesis
Design experiment to prove/disprove:
- Add logging to see values
- Add breakpoints to pause execution
- Modify input to isolate variable
- Disable feature to rule out
- Compare with working version
Keep notes:
**Hypothesis:** Database query timeout
**Test:** Add query timing logs
**Result:** Query completes in 50ms
**Conclusion:** Not the database ❌
**Hypothesis:** Network latency
**Test:** Check network tab, add timing
**Result:** API call takes 5 seconds
**Conclusion:** Found the issue ✅
4. Analyze Results
What did you learn?
- Hypothesis confirmed or rejected?
- New questions raised?
- Unexpected findings?
- Root cause identified?
5. Repeat or Conclude
If root cause found:
- Document findings
- Estimate impact
- Plan fix
If not found:
- Form new hypothesis
- Repeat cycle
Debugging Strategies
Strategy 1: Add Logging
Most universally useful technique:
// Strategic console.log placement
function processOrder(order) {
console.log('processOrder START:', { orderId: order.id })
const items = order.items
console.log('items:', items.length)
const validated = validate(items)
console.log('validation result:', validated)
if (!validated.success) {
console.log('validation failed:', validated.errors)
throw new Error('Invalid order')
}
const total = calculateTotal(items)
console.log('total calculated:', total)
console.log('processOrder END')
return total
}
Logging guidelines:
- Log function entry/exit
- Log branching decisions
- Log external calls (API, database)
- Log unexpected values
- Include context (IDs, user info)
Strategy 2: Use Debugger
Interactive debugging:
// Browser
function buggyFunction(input) {
debugger; // Execution pauses here
const result = transform(input)
debugger; // And here
return result
}
// Node.js
node --inspect app.js
# Then open chrome://inspect in Chrome
Debugger features:
- Step over (next line)
- Step into (into function call)
- Step out (back to caller)
- Watch expressions
- Call stack inspection
- Variable inspection
Strategy 3: Binary Search
Isolate the problem area:
// 100 lines of code, bug somewhere
// Comment out lines 50-100
// Bug still happens? It's in lines 1-50
// Comment out lines 25-50
// Bug disappears? It's in lines 25-50
// Comment out lines 37-50
// Bug still happens? It's in lines 25-37
// Continue until isolated to specific lines
Strategy 4: Rubber Duck Debugging
Explain the problem out loud:
- "This function is supposed to calculate shipping cost"
- "It takes the weight and destination"
- "First it... wait, it's using price instead of weight!"
- (Bug found)
Why this works: Forces you to examine assumptions.
Strategy 5: Compare Working vs Broken
What's different?
Version comparison:
# Find which commit broke it
git bisect start
git bisect bad HEAD
git bisect good v1.0.0
# Git checks out middle commit
npm test
git bisect good/bad
# Repeat until found
Environment comparison:
- Works locally but not production?
- Works for some users but not others?
- Worked yesterday but not today?
What changed?
Strategy 6: Simplify
Reduce to minimal reproduction:
// Complex case with bug
processUserOrderWithDiscountsAndShipping(user, cart, promo, address)
// Simplify inputs one at a time
processUserOrderWithDiscountsAndShipping(user, [], null, null)
// Still breaks? Not discount or address
processUserOrderWithDiscountsAndShipping(null, [], null, null)
// Works now? It's the user object
// What about the user object causes it?
Strategy 7: Check Assumptions
Question everything:
// Assumption: API returns array
const users = await api.getUsers()
users.forEach(...) // Crashes
// Check assumption
console.log(typeof users) // "undefined"
console.log(users) // undefined
// Assumption was wrong!
Common wrong assumptions:
- Function returns expected type
- Variable is defined
- Array is not empty
- API will always respond
- Async operation has completed
- State is up to date
Debugging by Symptom
"Intermittent failure"
Likely causes:
- Race condition (timing-dependent)
- Data-dependent (certain inputs trigger it)
- Resource leak (happens after N operations)
- External service flakiness
Investigation:
- Add extensive logging
- Look for async operations
- Check timing between operations
- Look for shared state
- Run many times to see pattern
"Works locally, fails in production"
Check differences:
- Environment variables
- Data (production has different/more data)
- Network (CORS, SSL, proxies)
- Dependencies (versions, OS)
- Resources (memory, connections)
"Slow performance"
Don't guess - profile:
Frontend:
- Chrome DevTools > Performance tab
- Look for long tasks (> 50ms)
- Check for layout thrashing
- Look for memory leaks
Backend:
- Add timing logs around operations
- Check database query time (EXPLAIN ANALYZE)
- Check external API call time
- Profile with APM tool
"Memory leak"
Investigation:
// Take heap snapshot
// Do operation that leaks
// Take another heap snapshot
// Compare - what increased?
Common causes:
- Event listeners not removed
- Closures holding references
- Global variables accumulating
- Intervals not cleared
- Cache growing unbounded
"Crash/Exception"
Read the stack trace:
Error: Cannot read property 'map' of undefined
at processUsers (app.js:42:15)
at handleRequest (app.js:23:3)
at Server.<anonymous> (server.js:12:5)
Stack trace tells you:
- Line 42: Where it crashed
- Line 23: Where it was called from
- Line 12: Origin of the request
Then:
- Go to line 42
- Check what's undefined
- Trace back why it's undefined
Common Bug Patterns
Null/Undefined
// Bug
function process(user) {
return user.name.toUpperCase() // Crashes if user is null
}
// Investigation
console.log('user:', user) // undefined - why?
// Trace back to where user comes from
Off-by-One
// Bug
for (let i = 0; i <= array.length; i++) { // <= instead of <
process(array[i]) // Crashes on last iteration
}
// Investigation
console.log('i:', i, 'length:', array.length)
// Notice i === array.length causes array[i] === undefined
Async Timing
// Bug
let data
fetchData().then(result => {
data = result
})
console.log(data) // undefined - async not complete
// Investigation
console.log('1. Before fetch')
fetchData().then(result => {
console.log('3. Got result')
data = result
})
console.log('2. After fetch call')
// Output: 1, 2, 3 - async completes later
State Mutation
// Bug
function addItem(cart, item) {
cart.items.push(item) // Mutates input!
return cart
}
const originalCart = { items: [] }
const newCart = addItem(originalCart, item)
// originalCart was modified - unexpected!
// Investigation
console.log('before:', originalCart)
const newCart = addItem(originalCart, item)
console.log('after:', originalCart) // Changed!
Scope Issues
// Bug
for (var i = 0; i < 3; i++) {
setTimeout(() => console.log(i), 100)
}
// Prints: 3, 3, 3 (expected 0, 1, 2)
// Investigation
// var is function-scoped, i is shared
// By time timeout fires, loop is done, i === 3
// Fix: Use let (block-scoped) or capture i
Debugging Tools
Browser Developer Tools
Console:
console.log()- Print valuesconsole.table()- Display arrays/objects as tableconsole.trace()- Print stack traceconsole.time()/console.timeEnd()- Measure duration
Debugger:
- Set breakpoints
- Step through code
- Inspect variables
- Watch expressions
- Call stack
Network:
- View all requests
- See request/response headers and bodies
- Measure timing
- Replay requests
Performance:
- Record profile
- See function call tree
- Identify bottlenecks
- Check memory usage
Command Line Tools
# Search for text in files
grep -r "error" logs/
# Follow log file
tail -f logs/app.log
# Search with context
grep -B 5 -A 5 "ERROR" logs/app.log
# Find large files
du -sh *
# Check disk space
df -h
# Check memory
free -m
# Check running processes
ps aux | grep node
Database Debugging
-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
-- Show slow queries
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;
-- Check table size
SELECT pg_size_pretty(pg_total_relation_size('users'));
-- Check indexes
\d users
Debugging Checklist
Before Starting
- Can reproduce the issue reliably?
- Have error message or symptom description?
- Know when it started happening?
- Checked if recent changes related?
- Checked logs for clues?
During Investigation
- Formed clear hypothesis?
- Testing hypothesis systematically?
- Taking notes on findings?
- Not making random changes hoping to fix?
- Questioning assumptions?
After Finding Root Cause
- Understand WHY it happens?
- Can explain it to someone else?
- Documented findings?
- Estimated impact?
- Identified proper fix?
Anti-Patterns
❌ Random Code Changes
BAD: "Maybe if I change this... nope, try this... nope, try this..."
GOOD: "Hypothesis: X causes Y. Test: Change X. Result: Y still happens.
Conclusion: X is not the cause."
❌ Assuming Without Verifying
BAD: "The API must be returning valid data"
GOOD: "Let me log the API response to see what it actually returns"
❌ Stopping at Symptoms
BAD: "The page is blank. Fixed by adding a null check."
GOOD: "The page is blank because user is null. User is null because
authentication token expired. Root cause: token not being refreshed."
❌ Debugging in Production
BAD: "Let me add console.log to production to see..."
GOOD: "Let me reproduce locally and debug there, or use proper logging"
❌ No Reproduction Steps
BAD: "It crashed once, let me guess why"
GOOD: "Let me find reliable way to reproduce it first"
Integration with Other Skills
- Use proof-of-work skill to document evidence
- Use test-driven-development skill to add regression test after fix
- Use explainer skill when explaining bug to others
- Use boy-scout-rule skill while fixing (improve surrounding code)
Remember
- Reproduce first - If you can't reproduce, you can't debug
- Gather evidence - Don't guess, look at data
- Form hypothesis - What do you think is wrong?
- Test systematically - Prove or disprove hypothesis
- Find root cause - Not just symptoms
- Document - Help future you and others
Debugging is detective work. Be methodical, not random.