| name | recover-from-error |
| description | Diagnose task protocol errors and provide targeted recovery procedures |
| allowed-tools | Bash, Read, Write |
Recovery from Error Skill
Purpose: Diagnose errors during task protocol execution and provide step-by-step recovery procedures.
Performance: Reduces recovery time from 15+ minutes to 2-5 minutes through targeted diagnosis.
When to Use This Skill
Use recover-from-error When:
- Build failure occurs during VALIDATION
- Agent fails or times out during REQUIREMENTS/IMPLEMENTATION
- Session interrupted and needs to resume
- State corruption detected (lock file issues, worktree problems)
- Cannot transition to next state (prerequisites not met)
Do NOT Use When:
- Normal protocol execution (no errors)
- User requests different approach (not an error)
- Task completed successfully
Error Classification Decision Tree
Step 1: Identify Error Category
What type of error occurred?
│
├─ Build/Test Failure ────────────► Section A: Build Recovery
│
├─ Agent Incomplete ──────────────► Section B: Agent Recovery
│
├─ Session Interrupted ───────────► Section C: Session Recovery
│
├─ State Mismatch ────────────────► Section D: State Recovery
│
└─ Lock/Worktree Issue ───────────► Section E: Infrastructure Recovery
Section A: Build Failure Recovery {#build-recovery}
A1: Diagnose Build Failure Type
# Run build and capture output
./mvnw verify 2>&1 | tee /tmp/build-output.log
# Classify failure type
if grep -q "COMPILATION ERROR" /tmp/build-output.log; then
echo "TYPE: Compilation Error → See A2"
elif grep -q "There are test failures" /tmp/build-output.log; then
echo "TYPE: Test Failure → See A3"
elif grep -q "Checkstyle violations" /tmp/build-output.log; then
echo "TYPE: Style Violation → See A4"
elif grep -q "PMD Failure" /tmp/build-output.log; then
echo "TYPE: PMD Violation → See A4"
else
echo "TYPE: Unknown → See A5"
fi
A2: Compilation Error Recovery
Quick Fixes (Main Agent Can Fix):
- Missing imports: Add import statements
- Symbol not found (typo): Fix identifier names
- Module not exported: Update module-info.java
Delegation Required (Re-invoke Agent):
- Missing method implementations
- Type mismatches requiring design changes
- Interface contract violations
# Extract compilation errors
grep -A5 "COMPILATION ERROR" /tmp/build-output.log
# If simple fix (import/typo): Fix directly
# If complex (design issue): Re-invoke architect agent
A3: Test Failure Recovery
# List failing tests
grep -E "Tests run:.*Failures: [1-9]" /tmp/build-output.log
# Get failure details
grep -B5 "FAILURE!" /tmp/build-output.log
# Decision:
# - Test logic wrong → Re-invoke tester agent
# - Implementation wrong → Re-invoke architect agent
# - Both → Re-invoke both agents
A4: Style Violation Recovery
# Count violations
CHECKSTYLE_COUNT=$(grep -c "Checkstyle violation" /tmp/build-output.log || echo 0)
PMD_COUNT=$(grep -c "PMD violation" /tmp/build-output.log || echo 0)
echo "Checkstyle: $CHECKSTYLE_COUNT, PMD: $PMD_COUNT"
# Decision tree:
if [ $((CHECKSTYLE_COUNT + PMD_COUNT)) -le 5 ]; then
echo "Small count - Main agent can fix directly"
else
echo "Many violations - Re-invoke formatter agent"
fi
A5: Unknown Build Failure
# Check for common issues
echo "=== Checking common causes ==="
# 1. Module resolution
grep -i "module" /tmp/build-output.log | grep -i "error"
# 2. Dependency issues
grep -i "dependency" /tmp/build-output.log | grep -i "error"
# 3. Resource issues
grep -i "resource" /tmp/build-output.log | grep -i "not found"
# If still unclear: Present full log to user
echo "Unclassified failure - manual investigation required"
cat /tmp/build-output.log | head -100
Section B: Agent Recovery {#agent-recovery}
B1: Diagnose Agent Status
TASK_NAME="${TASK_NAME}" # Set task name
AGENTS=$(jq -r '.required_agents[]' "/workspace/tasks/$TASK_NAME/task.json")
echo "=== Agent Status Check ==="
for agent in $AGENTS; do
STATUS_FILE="/workspace/tasks/$TASK_NAME/agents/$agent/status.json"
if [ -f "$STATUS_FILE" ]; then
STATUS=$(jq -r '.status' "$STATUS_FILE")
DECISION=$(jq -r '.decision // "N/A"' "$STATUS_FILE")
echo "$agent: status=$STATUS, decision=$DECISION"
else
echo "$agent: NO STATUS FILE (not invoked or failed early)"
fi
done
B2: Agent Missing Status File
Cause: Agent never started or crashed before creating status.json
Recovery:
# 1. Check if agent worktree exists
ls -la /workspace/tasks/$TASK_NAME/agents/$agent/code
# 2. If worktree exists but no status: Agent crashed mid-execution
# → Re-invoke agent with same requirements
# 3. If no worktree: Agent was never invoked
# → Create worktree and invoke agent
git worktree add /workspace/tasks/$TASK_NAME/agents/$agent/code -b $TASK_NAME-$agent
B3: Agent Status = ERROR
agent="$AGENT_NAME"
ERROR_MSG=$(jq -r '.error_message // "Unknown"' "/workspace/tasks/$TASK_NAME/agents/$agent/status.json")
echo "Agent $agent ERROR: $ERROR_MSG"
# Common errors and fixes:
# - "Merge conflict" → Resolve conflicts in agent worktree, retry merge
# - "Build failed" → Fix build in agent worktree, retry
# - "Tool error" → Re-invoke with adjusted scope
# - "Timeout" → Check if work partial, resume or restart
B4: Agent Status = WORKING (Stuck)
agent="$AGENT_NAME"
LAST_UPDATE=$(jq -r '.updated_at' "/workspace/tasks/$TASK_NAME/agents/$agent/status.json")
AGE_MINUTES=$(( ($(date +%s) - $(date -d "$LAST_UPDATE" +%s)) / 60 ))
if [ $AGE_MINUTES -gt 60 ]; then
echo "Agent $agent stuck for $AGE_MINUTES minutes"
# Check retry count
RETRIES=$(jq -r '.retry_count // 0' "/workspace/tasks/$TASK_NAME/agents/$agent/status.json")
if [ $RETRIES -ge 3 ]; then
echo "Max retries exceeded - escalate to user"
else
echo "Re-invoking agent (retry $((RETRIES + 1))/3)"
# Re-invoke via Task tool
fi
fi
Section C: Session Recovery {#session-recovery}
C1: Detect Current Session State
SESSION_ID="${SESSION_ID}"
TASK_DIR=$(find /workspace/tasks -maxdepth 2 -name "task.json" -exec grep -l "$SESSION_ID" {} \; | head -1)
if [ -n "$TASK_DIR" ]; then
TASK_NAME=$(dirname "$TASK_DIR" | xargs basename)
CURRENT_STATE=$(jq -r '.state' "$TASK_DIR")
echo "Active task: $TASK_NAME (State: $CURRENT_STATE)"
else
echo "No active task for this session"
fi
C2: Resume From State
Based on current state, resume appropriately:
| State | Resume Action |
|---|---|
| INIT/CLASSIFIED | Safe to restart from beginning |
| REQUIREMENTS | Check agent reports, re-invoke incomplete agents |
| SYNTHESIS | Check if plan exists, present to user if complete |
| IMPLEMENTATION | Check agent status, resume incomplete agents |
| VALIDATION | Re-run build verification |
| AWAITING_USER_APPROVAL | Re-present changes for approval |
| COMPLETE | Proceed to CLEANUP |
case "$CURRENT_STATE" in
"SYNTHESIS")
# Check for approval flag
if [ -f "/workspace/tasks/$TASK_NAME/user-approved-synthesis.flag" ]; then
echo "Plan approved - proceed to IMPLEMENTATION"
else
echo "Re-present plan to user for approval"
fi
;;
"AWAITING_USER_APPROVAL")
COMMIT=$(jq -r '.checkpoint.commit_sha' "/workspace/tasks/$TASK_NAME/task.json")
echo "Re-present changes (commit $COMMIT) for user approval"
;;
*)
echo "Resume from $CURRENT_STATE state"
;;
esac
Section D: State Recovery {#state-recovery}
D1: State Mismatch Detection
TASK_NAME="${TASK_NAME}"
LOCK_STATE=$(jq -r '.state' "/workspace/tasks/$TASK_NAME/task.json")
# Check what work actually exists
HAS_REQUIREMENTS=$(ls /workspace/tasks/$TASK_NAME/*-requirements.md 2>/dev/null | wc -l)
HAS_APPROVAL_FLAG=$(test -f "/workspace/tasks/$TASK_NAME/user-approved-synthesis.flag" && echo 1 || echo 0)
HAS_IMPL_COMMITS=$(git -C /workspace/tasks/$TASK_NAME/code log --oneline -10 | grep -c "\[.*\]" || echo 0)
echo "Lock state: $LOCK_STATE"
echo "Requirements reports: $HAS_REQUIREMENTS"
echo "Approval flag: $HAS_APPROVAL_FLAG"
echo "Implementation commits: $HAS_IMPL_COMMITS"
# Detect mismatches
if [ "$LOCK_STATE" = "INIT" ] && [ $HAS_REQUIREMENTS -gt 0 ]; then
echo "MISMATCH: Lock says INIT but requirements exist"
fi
D2: Fix State Mismatch
# Update lock state to match actual work
CORRECT_STATE="REQUIREMENTS" # Determine correct state from evidence
jq --arg state "$CORRECT_STATE" \
--arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
'.state = $state | .transition_log += [{"type": "recovery", "to": $state, "timestamp": $ts}]' \
/workspace/tasks/$TASK_NAME/task.json > /tmp/task.json.tmp
mv /tmp/task.json.tmp /workspace/tasks/$TASK_NAME/task.json
echo "State corrected to: $CORRECT_STATE"
Section E: Infrastructure Recovery {#infrastructure-recovery}
E1: Lock File Issues
TASK_NAME="${TASK_NAME}"
LOCK_FILE="/workspace/tasks/$TASK_NAME/task.json"
# Check if lock file exists and is valid
if [ ! -f "$LOCK_FILE" ]; then
echo "Lock file missing"
# If worktree exists: May be crashed session
# Ask user whether to clean up or recover
elif ! jq empty "$LOCK_FILE" 2>/dev/null; then
echo "Lock file corrupted (invalid JSON)"
# Backup and recreate
cp "$LOCK_FILE" "${LOCK_FILE}.backup"
echo "Backed up to ${LOCK_FILE}.backup"
fi
E2: Worktree Issues
TASK_NAME="${TASK_NAME}"
# Check task worktree
if [ ! -d "/workspace/tasks/$TASK_NAME/code" ]; then
echo "Task worktree missing - recreating"
git worktree add /workspace/tasks/$TASK_NAME/code -b $TASK_NAME
fi
# Check agent worktrees
for agent in architect formatter tester; do
if [ ! -d "/workspace/tasks/$TASK_NAME/agents/$agent/code" ]; then
echo "Agent worktree missing: $agent - recreating"
mkdir -p /workspace/tasks/$TASK_NAME/agents/$agent
git worktree add /workspace/tasks/$TASK_NAME/agents/$agent/code -b $TASK_NAME-$agent
fi
done
Quick Reference: Error → Recovery
| Error | Section | Quick Action |
|---|---|---|
| Compilation error | A2 | Fix imports/typos or re-invoke architect |
| Test failure | A3 | Re-invoke tester or architect |
| Style violations | A4 | Fix directly (<5) or re-invoke formatter (>5) |
| Agent missing status | B2 | Re-invoke agent |
| Agent ERROR | B3 | Check error, fix, re-invoke |
| Agent stuck | B4 | Re-invoke (max 3 retries) |
| Session interrupted | C | Resume from detected state |
| State mismatch | D | Correct lock file state |
| Lock corrupted | E1 | Backup and recreate |
| Worktree missing | E2 | Recreate worktree |
Related Skills
- state-transition: Safe state transitions with validation
- reinvoke-agent-fixes: Re-invoke agents for specific fixes
- validate-style-complete: Three-component style validation
- build-test-report: Run build and capture results