| name | error-recovery |
| description | Handle Claude Code errors gracefully - error types, recovery, prevention |
| version | 1.0.0 |
| author | Claude Code SDK |
| tags | errors, recovery, handling, prevention |
Error Recovery
Handle Claude Code errors gracefully with systematic recovery strategies and prevention techniques.
Quick Reference
| Error Category | Common Causes | Quick Fix |
|---|---|---|
| API Errors | Rate limits, overload, auth | Wait, retry, check credentials |
| Tool Errors | Permissions, missing files | Check permissions, validate paths |
| Context Errors | Token overflow, corruption | /compact or /clear |
| MCP Errors | Server disconnect, timeout | Restart server, check logs |
| Hook Errors | JSON syntax, script failure | Validate JSON, test script |
Error Message Anatomy
Claude Code error messages follow a consistent pattern:
[Error Category] [Specific Error]: [Description]
at [Location/Context]
Cause: [Root cause if known]
Suggestion: [Recommended action]
Reading Error Messages
| Part | What It Tells You | Action |
|---|---|---|
| Category | Type of error (API, Tool, etc.) | Determines recovery approach |
| Specific Error | Exact error code/name | Look up in error reference |
| Description | Human-readable explanation | Understand what went wrong |
| Location | Where error occurred | Identify failing component |
| Cause | Why it happened | Fix root cause |
| Suggestion | Recommended fix | Try suggested action first |
Common Error Patterns
API Errors
| Error | Meaning | Recovery |
|---|---|---|
rate_limit_error |
Too many requests | Wait 60s, reduce frequency |
overloaded_error |
API at capacity | Wait 30-60s, retry |
context_length_exceeded |
Too many tokens | /compact or split request |
authentication_error |
Invalid/expired token | claude auth login |
invalid_request_error |
Malformed request | Check input format |
api_error |
Server-side issue | Retry with backoff |
Tool Errors
| Error | Meaning | Recovery |
|---|---|---|
permission_denied |
Tool not allowed | /permissions, allow tool |
file_not_found |
Path doesn't exist | Verify path, check working dir |
directory_not_found |
Dir doesn't exist | Create directory first |
read_error |
Can't read file | Check permissions, encoding |
write_error |
Can't write file | Check permissions, disk space |
command_failed |
Bash command error | Check exit code, stderr |
timeout |
Operation too slow | Increase timeout, simplify |
Context Errors
| Error | Meaning | Recovery |
|---|---|---|
context_overflow |
Token limit reached | /compact or /clear |
memory_limit |
Too much in memory | Clear memory banks |
session_expired |
Session timed out | Start new session |
state_corruption |
Session state invalid | /clear, restart |
Recovery Workflow
Step 1: Identify Error Type
Error occurred
|
+-- API Error?
| +-- Yes --> See API recovery
| +-- No --> Continue
|
+-- Tool Error?
| +-- Yes --> See Tool recovery
| +-- No --> Continue
|
+-- Context Error?
| +-- Yes --> See Context recovery
| +-- No --> Continue
|
+-- Unknown?
+-- Check debug output
+-- Use /bug to report
Step 2: Apply Recovery Strategy
For API Errors:
- Wait for rate limit window (60s typical)
- Retry with exponential backoff
- If persistent, check credentials
For Tool Errors:
- Check
/permissions - Validate inputs (paths, arguments)
- Check file/directory exists
For Context Errors:
- Run
/compactto reduce context - If severe, use
/clear - Start fresh if corrupted
Step 3: Verify Recovery
# Check system health
claude doctor
# Verify specific component
/permissions # Tool permissions
/mcp # MCP servers
/hooks # Hook status
Quick Recovery Commands
| Situation | Command |
|---|---|
| Context too large | /compact |
| Session corrupted | /clear |
| Need to restart | Ctrl+C, restart claude |
| Check health | claude doctor |
| Debug mode | claude --debug |
| Verbose logging | ANTHROPIC_LOG=debug claude |
Retry Patterns
Simple Retry
For transient errors (rate limits, overload):
1. Wait initial delay (1s)
2. Retry operation
3. If fails, double delay (2s, 4s, 8s...)
4. Max 5 retries or 60s total
5. If still failing, escalate
Backoff with Jitter
For high-contention scenarios:
delay = min(cap, base * 2^attempt) + random(0, 1000ms)
- Base: 1000ms
- Cap: 60000ms
- Jitter: 0-1000ms random
Circuit Breaker
For persistent failures:
If 3 failures in 60s:
Open circuit (stop trying)
Wait 5 minutes
Try once (half-open)
If success: close circuit
If failure: keep open
Error Prevention Checklist
Before operations:
- Validate file paths exist
- Check permissions are granted
- Verify network connectivity
- Ensure context has headroom
- Test hooks work correctly
During operations:
- Watch for warning signs
- Monitor context size
- Handle errors gracefully
- Log important state
After errors:
- Document what happened
- Fix root cause
- Add prevention measures
- Test fix works
Reference Files
| File | Contents |
|---|---|
| ERROR-TYPES.md | Comprehensive error reference |
| RECOVERY-PATTERNS.md | Recovery strategies and patterns |
| PREVENTION.md | Error prevention techniques |
Common Scenarios
Scenario: Rate Limited
Symptom: rate_limit_error after many requests
Solution:
- Wait 60 seconds
- Reduce request frequency
- Batch operations when possible
Scenario: Context Overflow
Symptom: context_length_exceeded error
Solution:
- Run
/compactto summarize context - If still too large,
/clearand restart - Use smaller file reads (with offset/limit)
Scenario: Tool Permission Denied
Symptom: Tool blocked by permissions
Solution:
- Run
/permissions - Allow the specific tool
- Or add to settings.json for persistence
Scenario: MCP Server Disconnected
Symptom: MCP tools return errors
Solution:
- Check
/mcpfor server status - Restart MCP server if needed
- Verify
.mcp.jsonconfiguration
Best Practices
- Fail Fast: Validate early, fail before expensive operations
- Graceful Degradation: Have fallbacks for non-critical features
- Clear Errors: Provide actionable error messages
- Log Everything: Enable debug mode when troubleshooting
- Test Recovery: Verify recovery procedures work before you need them
When to Escalate
Use /bug command when:
- Error persists after recovery attempts
- Error message is unclear or missing
- Behavior contradicts documentation
- Reproducible crash occurs
Include in report:
- Claude Code version
- Error message (full text)
- Steps to reproduce
- Debug output