| name | debugging |
| description | Comprehensive debugging specialist for errors, test failures, log analysis, and system problems. Use when encountering issues, analyzing error logs, investigating system anomalies, debugging production issues, analyzing stack traces, or identifying root causes. Combines general debugging workflows with error pattern detection and log analysis. |
| author | Joseph OBrien |
| status | unpublished |
| updated | 2025-12-23 |
| version | 1.0.1 |
| tag | skill |
| type | skill |
Debugging
This skill provides comprehensive debugging capabilities for identifying and fixing errors, test failures, unexpected behavior, and production issues. It combines general debugging workflows with specialized error analysis, log parsing, and pattern recognition.
When to Use This Skill
- When encountering errors or exceptions in code
- When tests are failing and you need to understand why
- When investigating unexpected behavior or bugs
- When analyzing stack traces and error messages
- When debugging production issues
- When fixing issues reported by users or QA
- When analyzing error logs and stack traces
- When investigating performance issues or anomalies
- When correlating errors across multiple services
- When identifying recurring error patterns
- When setting up error monitoring and alerting
- When conducting post-mortem analysis of incidents
What This Skill Does
- Error Analysis: Captures and analyzes error messages and stack traces
- Log Parsing: Extracts errors from logs using regex patterns and structured parsing
- Stack Trace Analysis: Analyzes stack traces across multiple programming languages
- Error Correlation: Identifies relationships between errors across distributed systems
- Pattern Recognition: Detects common error patterns and anti-patterns
- Reproduction: Identifies steps to reproduce the issue
- Isolation: Locates the exact failure point in code
- Root Cause Analysis: Works backward from symptoms to identify underlying causes
- Minimal Fix: Implements the smallest change that resolves the issue
- Verification: Confirms the solution works and doesn't introduce new issues
- Monitoring Setup: Creates queries and alerts for error detection
Helper Scripts
This skill includes Python helper scripts in scripts/:
parse_logs.py: Parses log files and extracts errors, exceptions, and stack traces. Outputs JSON with error analysis and pattern detection.python scripts/parse_logs.py /var/log/app.log
How to Use
Debug an Error
Debug this error: TypeError: Cannot read property 'x' of undefined
Investigate why the test is failing in test_user_service.js
Analyze Error Logs
Analyze the error logs in /var/log/app.log and identify the root cause
Investigate why the API is returning 500 errors
Pattern Detection
Find patterns in these error logs from the past 24 hours
Correlate errors between the API service and database
Debugging Process
1. Capture Error Information
Error Message:
- Read the full error message
- Note the error type (TypeError, ReferenceError, etc.)
- Identify the error location (file and line number)
Stack Trace:
- Analyze the call stack
- Identify the sequence of function calls
- Find where the error originated
Context:
- Check recent code changes
- Review related code files
- Understand the execution flow
2. Error Extraction (Log Analysis)
Using Helper Script:
The skill includes a Python helper script for parsing logs:
# Parse log file and extract errors
python scripts/parse_logs.py /var/log/app.log
Manual Log Parsing Patterns:
# Extract errors from logs
grep -i "error\|exception\|fatal\|critical" /var/log/app.log
# Extract stack traces
grep -A 20 "Exception\|Error\|Traceback" /var/log/app.log
# Extract specific error types
grep "TypeError\|ReferenceError\|SyntaxError" /var/log/app.log
Structured Log Parsing:
// Parse JSON logs
const errors = logs
.filter(log => log.level === 'error' || log.level === 'critical')
.map(log => ({
timestamp: log.timestamp,
message: log.message,
stack: log.stack,
context: log.context
}));
3. Stack Trace Analysis
Common Patterns:
JavaScript/Node.js:
Error: Cannot read property 'x' of undefined
at FunctionName (file.js:123:45)
at AnotherFunction (file.js:456:78)
Python:
Traceback (most recent call last):
File "app.py", line 123, in function_name
result = process(data)
File "utils.py", line 45, in process
return data['key']
KeyError: 'key'
Java:
java.lang.NullPointerException
at com.example.Class.method(Class.java:123)
at com.example.AnotherClass.call(AnotherClass.java:456)
4. Error Correlation
Timeline Analysis:
- Group errors by timestamp
- Identify error spikes and patterns
- Correlate with deployments or changes
- Check for cascading failures
Service Correlation:
- Map errors across service boundaries
- Identify upstream/downstream relationships
- Track error propagation paths
- Find common failure points
5. Pattern Recognition
Common Error Patterns:
N+1 Query Problem:
Multiple database queries in loop
Pattern: SELECT * FROM users; SELECT * FROM posts WHERE user_id = ?
Memory Leaks:
Gradually increasing memory usage
Pattern: Memory growth over time without release
Race Conditions:
Intermittent failures under load
Pattern: Errors only occur with concurrent requests
Timeout Issues:
Requests timing out
Pattern: Errors after specific duration (e.g., 30s)
6. Reproduce the Issue
Reproduction Steps:
- Identify the exact conditions that trigger the error
- Create a minimal test case that reproduces the issue
- Verify the issue is consistent and reproducible
- Document the steps clearly
Example:
## Reproduction Steps
1. Navigate to `/users/123`
2. Click "Edit Profile"
3. Submit form without filling required fields
4. Error occurs: "Cannot read property 'validate' of undefined"
7. Isolate the Failure Location
Code Analysis:
- Read the code around the error location
- Trace the execution path
- Identify where the assumption breaks
- Check variable states and values
Debugging Techniques:
- Add strategic logging to track execution
- Use debugger breakpoints
- Inspect variable states
- Check function return values
- Verify data structures
8. Form and Test Hypotheses
Hypothesis Formation:
- What could cause this error?
- What assumptions might be wrong?
- What edge cases weren't considered?
- What dependencies might be missing?
Testing Hypotheses:
- Add logging to verify assumptions
- Test edge cases
- Check input validation
- Verify dependencies are available
- Test with different data
9. Root Cause Analysis
Investigation Steps:
- Start with Symptoms: What error is occurring?
- Work Backward: What changed before the error?
- Check Patterns: Is this recurring or isolated?
- Correlate Events: What else happened at the same time?
- Identify Cause: What is the underlying issue?
Analysis Framework:
## Error Analysis
**Error**: [Description]
**Frequency**: [How often]
**Timeline**: [When it started]
**Affected Services**: [Which services]
**User Impact**: [How many users affected]
**Root Cause Hypothesis**:
- [Primary hypothesis with evidence]
- [Alternative hypotheses]
**Evidence**:
- [Log entries supporting hypothesis]
- [Error patterns observed]
- [Correlation with other events]
**Recommended Actions**:
- [Immediate fix]
- [Long-term prevention]
10. Implement Minimal Fix
Fix Principles:
- Fix the root cause, not just symptoms
- Make the smallest change possible
- Preserve existing functionality
- Don't introduce new complexity
- Add appropriate error handling
Fix Verification:
- Test the fix with the reproduction case
- Verify no regressions
- Check edge cases
- Ensure error handling is appropriate
- Confirm the fix is complete
11. Document the Solution
Documentation Should Include:
- Root cause explanation
- Evidence supporting the diagnosis
- Specific code fix
- Testing approach
- Prevention recommendations
Examples
Example 1: Null Reference Error
Input: Error: TypeError: Cannot read property 'name' of undefined
Output:
## Debugging Analysis
### Error Information
- **Type**: TypeError
- **Location**: `src/services/user.js:45`
- **Message**: Cannot read property 'name' of undefined
### Root Cause
The `getUser()` function returns `undefined` when a user is not found, but the code assumes it always returns a user object.
### Code Location
```javascript
// src/services/user.js:45
const user = getUser(userId);
console.log(user.name); // ❌ Error: user is undefined
Fix
// Add null check
const user = getUser(userId);
if (!user) {
throw new Error(`User ${userId} not found`);
}
console.log(user.name); // ✅ Safe
Prevention
- Always validate return values from functions that might return null/undefined
- Use optional chaining:
user?.name - Add TypeScript types to catch these issues at compile time
### Example 2: Database Connection Errors (Log Analysis)
**Input**: Error logs showing database connection failures
**Output**:
```markdown
## Error Analysis: Database Connection Failures
### Pattern Detected
**Error**: `ECONNREFUSED` errors from PostgreSQL
**Frequency**: 15 errors/minute, started 2 hours ago
**Timeline**: Errors began after deployment at 14:30
### Root Cause Analysis
**Hypothesis**: Connection pool exhaustion
**Evidence**:
- Errors correlate with high traffic period (14:30-16:30)
- Connection pool size: 10, concurrent requests: 50+
- No connection cleanup in error handlers
- Errors spike during peak usage
**Code Location**: `src/db/connection.js:45`
**Fix**:
```javascript
// Add connection cleanup
try {
const result = await query(sql);
return result;
} catch (error) {
// Ensure connection is released
await releaseConnection();
throw error;
}
Monitoring Query:
SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
## Reference Files
For detailed debugging workflows, error patterns, and techniques, load reference files as needed:
- **`references/debugging_workflows.md`** - Common debugging workflows by issue type, language-specific debugging, debugging techniques, debugging checklists, and common error patterns (database errors, memory leaks, race conditions, timeouts, authentication errors, network errors, application errors, performance errors)
- **`references/INCIDENT_POSTMORTEM.template.md`** - Incident postmortem template with timeline, root cause analysis, and action items
When debugging specific types of issues or analyzing error patterns, load `references/debugging_workflows.md` and refer to the relevant section.
## Best Practices
### Debugging Approach
1. **Start with Symptoms**: Understand what's wrong before jumping to solutions
2. **Work Backward**: Trace from error to cause
3. **Test Hypotheses**: Don't assume, verify
4. **Minimal Changes**: Fix only what's necessary
5. **Verify Fixes**: Always test that the fix works
### Log Analysis Techniques
1. **Use Structured Logging**: JSON logs are easier to parse and analyze
2. **Include Context**: Add request IDs, user IDs, timestamps to all logs
3. **Log Levels**: Use appropriate levels (error, warn, info, debug)
4. **Correlation IDs**: Use request IDs to trace errors across services
5. **Error Grouping**: Group similar errors to identify patterns
### Error Pattern Recognition
**Time-Based Patterns:**
- Errors at specific times (deployment windows, peak hours)
- Errors after specific duration (timeouts, memory leaks)
- Errors during specific events (database migrations, cache clears)
**Frequency Patterns:**
- Sudden spikes (deployment issues, traffic spikes)
- Gradual increases (memory leaks, resource exhaustion)
- Intermittent (race conditions, timing issues)
**Correlation Patterns:**
- Errors in multiple services simultaneously (infrastructure issues)
- Errors after specific user actions (application bugs)
- Errors correlated with external services (dependency issues)
### Common Debugging Patterns
**Null/Undefined Checks:**
```javascript
// Always check for null/undefined
if (!value) {
// Handle missing value
}
Error Handling:
try {
// Risky operation
} catch (error) {
// Log error with context
console.error('Operation failed:', error);
// Handle gracefully
}
Logging:
// Strategic logging
console.log('Before operation:', { userId, data });
const result = await operation();
console.log('After operation:', { result });
Type Checking:
// Verify types
if (typeof value !== 'string') {
throw new TypeError('Expected string');
}
Monitoring Setup
Error Rate Monitoring:
// Track error rate over time
const errorRate = errors.length / totalRequests;
if (errorRate > 0.01) { // 1% error rate threshold
alert('High error rate detected');
}
Error Alerting:
- Alert on error rate spikes (> 5% increase)
- Alert on new error types
- Alert on critical error patterns
- Alert on error correlation across services
Prevention Strategies
- Input Validation: Validate all inputs at boundaries
- Type Safety: Use TypeScript or type checking
- Error Boundaries: Catch errors at appropriate levels
- Testing: Write tests for edge cases
- Code Review: Review code for common pitfalls
Related Use Cases
- Fixing production bugs
- Debugging test failures
- Investigating user-reported issues
- Analyzing error logs
- Root cause analysis
- Performance debugging
- Production incident investigation
- System reliability analysis
- Error monitoring setup
- Post-mortem analysis
- Debugging distributed systems