| name | recovering-sessions |
| description | Recover from crashed, failed, or interrupted Claude Code sessions. Use this skill when: session crashed during multi-agent parallel execution, need to determine what work was completed vs incomplete, want to generate resumption commands for interrupted tasks, recovering from context window exhaustion, or handling session handoffs. Analyzes agent logs, verifies on-disk state, and creates resumption plans with ready-to-execute Task() commands. |
Session Recovery Skill
Recover from crashed, failed, or interrupted Claude Code sessions with automated analysis and resumption planning.
Why This Skill
Claude Code sessions can crash during complex multi-agent operations. Recovery without this skill is:
- Manual: Requires parsing agent logs by hand
- Error-prone: Easy to miss completed or incomplete work
- Time-consuming: Ad-hoc investigation for each crash
- Undocumented: Knowledge lost between sessions
This skill provides standardized recovery workflows that:
- Discover and analyze agent conversation logs
- Verify on-disk state matches expected deliverables
- Generate recovery reports with clear status per task
- Create ready-to-execute Task() resumption commands
- Document prevention patterns to avoid future crashes
When to Use
| Scenario | Action |
|---|---|
| Session crashed during parallel execution | Full recovery workflow |
| User reports interrupted work | Analyze recent agent logs |
| Need to determine completed vs incomplete | Status assessment |
| Generate resumption commands | Resumption plan |
| Session handoff to continue later | Handoff summary |
| Proactive health check | Scan for incomplete work |
Quick Start
Post-Crash Recovery
User: "Claude crashed while I had multiple agents running. Help me recover."
1. Identify project log directory
2. Discover recent agent logs (last 3 hours)
3. Analyze each log in parallel (haiku subagents)
4. Cross-reference with git status
5. Generate recovery report
6. Offer to commit completed work
7. Provide resumption commands for incomplete tasks
Session Handoff
User: "I need to hand off this session to continue later."
1. Identify active/recent work
2. Capture current git state
3. Document in-progress tasks
4. Update progress tracking with session notes
5. Generate handoff summary with resumption instructions
Proactive Health Check
User: "Check if any recent agent work was incomplete."
1. Scan recent agent logs (last 24 hours)
2. Identify logs without clear completion markers
3. Check if corresponding files exist
4. Report any gaps or incomplete work
5. Suggest remediation if needed
Core Workflow
Phase 1: Gather Context
Get project path and derive log directory:
# Project path: /Users/name/dev/project # Log directory: ~/.claude/projects/-Users-name-dev-project/ LOG_DIR="$HOME/.claude/projects/$(pwd | sed 's/\//-/g' | sed 's/^-//')"Check git status for uncommitted changes:
git status --shortRead implementation plan/PRD (if available) for expected deliverables
See ./techniques/discovering-agent-logs.md for detailed discovery patterns.
Phase 2: Discover Agent Logs
Find recent agent logs using multiple techniques:
| Technique | Command | Use Case |
|---|---|---|
| By recency | find $LOG_DIR -name "agent-*.jsonl" -mmin -180 |
Find agents active in last 3 hours |
| By ID pattern | ls $LOG_DIR | grep -E "(id1|id2)" |
Match known agent IDs |
| By size | find $LOG_DIR -name "*.jsonl" -size +10k -mtime -1 |
Find substantial logs |
| All recent | ls -lt $LOG_DIR/agent-*.jsonl | head -20 |
List by modification time |
See ./techniques/discovering-agent-logs.md for complete reference.
Phase 3: Analyze Each Log
For each discovered log, extract:
- Files created/modified (Write/Edit tool uses)
- Test results (passed/failed counts)
- Completion status (success markers, errors)
- Last activity timestamp
Delegation Strategy (token-efficient):
# Launch parallel analyzers with haiku model
Task("codebase-explorer", {
model: "haiku",
prompt: "Analyze agent log at {path}. Report: task ID, status, files created, test results, errors"
})
See ./techniques/parsing-jsonl-logs.md for extraction patterns.
Phase 4: Verify On-Disk State
Cross-reference agent deliverables against filesystem:
# For each file_path extracted from logs:
if [ -f "$file_path" ]; then
echo "VERIFIED: $file_path exists"
stat -f "%Sm" "$file_path" # Check modification time
else
echo "MISSING: $file_path not found"
fi
Verification Checklist:
- All expected files exist
- Files have content (not empty)
- Modification time aligns with session
- Tests pass (if test files created)
- No syntax errors (quick lint check)
See ./techniques/verifying-on-disk-state.md for complete verification workflow.
Phase 5: Generate Recovery Report
Produce structured report with:
- Agent status summary (COMPLETE/IN_PROGRESS/FAILED/UNKNOWN)
- Completed work inventory
- Interrupted tasks with resumption commands
- Recommended next actions
See ./techniques/generating-resumption-plans.md and ./templates/recovery-report.md.
Phase 6: Execute Recovery
With user approval:
- Commit completed work with comprehensive message
- Update progress tracking (via artifact-tracking skill)
- Resume interrupted tasks using generated Task() commands
Status Determination Logic
IF log contains "COMPLETE" or "successfully" in final messages → COMPLETE
ELIF log contains uncaught errors in final 10 lines → FAILED
ELIF log size > threshold AND recent modification → IN_PROGRESS (likely crashed)
ELIF log is small with no file writes → NOT_STARTED
ELSE → UNKNOWN (requires manual review)
Agent Delegation Strategy
Use haiku subagents for parallel log analysis (cheap, fast):
// Standard parallel execution (2-4 logs)
Task("codebase-explorer", { model: "haiku", prompt: "Analyze log 1..." })
Task("codebase-explorer", { model: "haiku", prompt: "Analyze log 2..." })
// Background execution (5+ logs)
const tasks = logs.map(log =>
Task("codebase-explorer", {
model: "haiku",
run_in_background: true,
prompt: `Analyze ${log.path}...`
})
);
const results = await Promise.all(
tasks.map(t => TaskOutput(t.id, { block: true }))
);
Recovery Report Format
# Session Recovery Report
Generated: {timestamp}
Project: {project_path}
## Agent Status Summary
| Agent ID | Task | Status | Files Created | Tests |
|----------|------|--------|---------------|-------|
| a91845a | P2-T1: MatchAnalyzer | COMPLETE | 2 | 33/33 |
| xyz1234 | P4-T3: RatingDialog | IN_PROGRESS | 1 | - |
## Completed Work
- MatchAnalyzer with 97.67% coverage
- SemanticScorer with embedding provider abstraction
## Interrupted Tasks
### P4-T3: RatingDialog
- Last known state: Component scaffolding created
- Missing: Tests, accessibility, documentation
- Resumption command:
Task("ui-engineer-enhanced", "Complete P4-T3: RatingDialog...")
## Recommended Actions
1. Commit completed work (files: {...})
2. Resume interrupted tasks with provided commands
3. Update progress tracking
See ./templates/recovery-report.md for full template.
Scripts
Node.js utilities in ./scripts/:
| Script | Purpose | Usage |
|---|---|---|
find-recent-agents.js |
Discover agent logs | node find-recent-agents.js --minutes 180 |
analyze-agent-log.js |
Parse single log | node analyze-agent-log.js <log-path> |
generate-recovery-report.js |
Create full report | node generate-recovery-report.js |
All scripts use ESM imports and modern JavaScript (Node.js 20+).
Prevention Recommendations
To prevent future crashes, see ./references/prevention-patterns.md:
- Batch Size Limits: Max 3-4 parallel agents per batch
- Checkpointing: Update progress after each task completion
- Session Tracking: Record agent IDs in progress files
- Heartbeat Pattern: Emit periodic status updates
- Sequential Critical Paths: Don't parallelize dependent tasks
Integration Points
With artifact-tracking Skill
Recovery updates progress tracking:
Task("artifact-tracker", "Update {PRD} phase {N}:
- Mark TASK-1.1 as complete (recovered)
- Mark TASK-1.2 as in_progress (interrupted)
- Add note: Recovered from session crash")
With execute-phase Command
Recovery can resume interrupted phase execution:
/dev:execute-phase {N} --resume
With Git Workflow
Recovery creates comprehensive commit:
git add {recovered_files}
git commit -m "feat: recover work from interrupted session
Completed:
- TASK-1.1: MatchAnalyzer
- TASK-1.2: SemanticScorer
Interrupted (to resume):
- TASK-1.3: RatingDialog
Recovered via session-recovery skill"
Reference Files
| File | Purpose |
|---|---|
./techniques/discovering-agent-logs.md |
Log location and discovery patterns |
./techniques/parsing-jsonl-logs.md |
JSONL extraction techniques |
./techniques/verifying-on-disk-state.md |
Filesystem verification workflow |
./techniques/generating-resumption-plans.md |
Recovery report generation |
./templates/recovery-report.md |
Report template |
./references/prevention-patterns.md |
Crash prevention guidance |
./scripts/ |
Node.js automation utilities |
Best Practices
During Recovery
- Don't rush - Verify state before committing
- Use haiku - Cheap parallel log analysis
- Verify on disk - Don't trust logs alone
- Commit incrementally - One commit per verified task
- Update tracking - Keep progress files current
Prevention
- Smaller batches - 3-4 agents max per batch
- Checkpoint often - After each task completion
- Track agent IDs - In progress file YAML
- Sequential dependencies - Don't parallelize dependent work
- Background execution - Use for large batches with timeouts
Example Recovery Session
> My session crashed with 6 agents running. Help me recover.
1. Finding project log directory...
LOG_DIR: ~/.claude/projects/-Users-miethe-dev-skillmeat/
2. Discovering recent agent logs (last 3 hours)...
Found 6 agent logs
3. Analyzing logs in parallel (haiku subagents)...
a91845a: COMPLETE - MatchAnalyzer (2 files, 33 tests)
abec615: COMPLETE - SemanticScorer (4 files, 17 tests)
adad508: COMPLETE - DiffViewer (3 files, 25 tests)
xyz1234: IN_PROGRESS - RatingDialog (1 file, no tests)
abc5678: FAILED - FormValidator (error in final lines)
def9012: NOT_STARTED - empty log
4. Verifying on-disk state...
All files from COMPLETE agents verified
RatingDialog component exists but incomplete
5. Generating recovery report...
[See ./templates/recovery-report.md]
6. Ready to commit completed work?
Files: [list of 9 files]
7. Resumption commands for interrupted work:
Task("ui-engineer-enhanced", "Complete P4-T3: RatingDialog...")
Task("python-backend-engineer", "Fix FormValidator...")
Summary
This skill provides standardized session recovery with:
- Automated agent log discovery and analysis
- On-disk state verification
- Clear status reporting per task
- Ready-to-execute resumption commands
- Prevention recommendations
Key Files:
- Discovery:
./techniques/discovering-agent-logs.md - Parsing:
./techniques/parsing-jsonl-logs.md - Verification:
./techniques/verifying-on-disk-state.md - Planning:
./techniques/generating-resumption-plans.md - Prevention:
./references/prevention-patterns.md