| name | multi-agent-autonomous-workflow |
| description | Orchestrates multi-agent workflow for feature implementation using specialized subagents. Use when implementing features, epics, or complex multi-story tasks that need analyst, architect, developer, tester, reviewer, and security agents. |
Multi-Agent Autonomous Workflow
Orchestrates specialized subagents for extended autonomous work (minutes to hours) with minimal human intervention.
Execution Model: Long-Running Until Complete
This workflow implements the Ralph Wiggum pattern for persistent autonomous execution:
The workflow NEVER exits until the goal is complete or a true blocker requires human intervention. Iterate continuously. Use previous failures as context for next attempt. State persists on disk. The Stop hook enforces completion before exit.
┌─────────────────────────────────────────────────────────────────────────────────┐
│ LONG-RUNNING EXECUTION MODEL │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ OUTER LOOP: PHASES │ │
│ │ │ │
│ │ Phase 0 ──▶ Phase 1 ──▶ Phase 2 ──▶ Phase 3 ──▶ WORKFLOW_COMPLETE │ │
│ │ (Setup) (Analyze) (Execute) (Finalize) │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ CLARIFY ┌───────────────────────────────┐ │ │
│ │ QUESTIONS │ MIDDLE LOOP: STORIES │ │ │
│ │ │ │ │ │
│ │ │ for each story: │ │ │
│ │ │ ┌────────────────────┐ │ │ │
│ │ │ │ INNER LOOP: ITERATE│ │ │ │
│ │ │ │ until ALL verified │ │ │ │
│ │ │ │ or max retries │ │ │ │
│ │ │ └────────────────────┘ │ │ │
│ │ └───────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │
│ COMPLETION: Stop hook blocks exit until WORKFLOW_COMPLETE marker is output │
└─────────────────────────────────────────────────────────────────────────────────┘
Phase 0: Setup & Platform Detection
Before any work, set up the execution context. This phase produces visible output.
Step 0.1: Session Recovery Check
ALWAYS run at the start of every session:
# Check if resuming existing workflow
python .claude/core/state.py recover
If resuming: Skip to the appropriate phase/story based on the recovered state.
Output:
## Session Recovery
| Check | Result |
|-------|--------|
| Existing workflow | [Yes/No] |
| Workflow ID | [ID if resuming] |
| Current phase | [Phase name] |
| Stories completed | [X/Y] |
| Current story | [Story ID and title] |
[If resuming]: Continuing from [story/phase]. Skipping to Phase [N].
Step 0.2: Discover Available Platforms (DYNAMIC)
IMPORTANT: Platform discovery is FULLY DYNAMIC. Never hardcode platform names or markers.
# Discover all platforms in the Workflows/platforms directory
# Each platform provides its own platform.json with markers
ls -d Workflows/platforms/*/ 2>/dev/null || ls -d ../Workflows/platforms/*/ 2>/dev/null
For each discovered platform directory, read its platform.json to get:
name: Platform identifiermarkers: Files/patterns that identify this platformmatchMode: "any" (match any marker) or "all" (match all markers)priority: Resolution priority when multiple match
Output the discovery:
## Platform Discovery
Scanning for available platforms...
| Platform | Description | Markers | Match Mode |
|----------|-------------|---------|------------|
| {name} | {displayName} | {markers joined} | {matchMode} |
...
Found {N} platform configurations.
Step 0.3: Match Platform to Codebase (DYNAMIC)
For each discovered platform, check if its markers exist in the codebase:
# Pseudocode for platform matching
for platform in discovered_platforms:
matched = check_markers(platform.markers, platform.matchMode)
if matched:
candidates.append((platform, platform.priority))
selected = max(candidates, key=lambda x: x[1]) # Highest priority wins
Output the matching:
## Platform Detection
Scanning codebase for platform markers...
| Platform | Markers Checked | Found | Match Result |
|----------|-----------------|-------|--------------|
| {name} | {marker} | [Yes/No] | [MATCH/NO MATCH] |
...
**Selected Platform:** {platform.name} (priority: {priority})
**Reason:** {why this platform was selected}
Step 0.4: Load Platform Configuration
From selected platform.json, extract and cache:
- Commands (build, test, lint, coverage, etc.)
- Conventions (naming, patterns, commit format)
- Quality gates (coverage thresholds, required checks)
- Project structure patterns
- Skills to load
Output:
## Platform Configuration Loaded
**Platform:** {displayName} (v{version})
### Commands
| Action | Command |
|--------|---------|
| {key} | `{command}` |
...
### Conventions
| Convention | Value |
|------------|-------|
| {key} | {value} |
...
### Quality Gates
| Gate | Threshold |
|------|-----------|
| Coverage (S) | {S}% |
| Coverage (M) | {M}% |
| Coverage (L) | {L}% |
| Coverage (XL) | {XL}% |
### Skills Loaded
{For each skill in platform.skills, list it}
---
Platform detection complete. Proceeding with workflow.
Phase 1: Analysis & Planning
Step 1.1: Pre-Analysis Clarification (REQUIRED CHECKPOINT)
BEFORE invoking the analyst agent, you MUST ask clarifying questions.
Use the AskUserQuestion tool to present questions about the goal:
## Pre-Analysis Clarification Required
Before I begin analyzing your request, I need to clarify:
Questions to ask (pick relevant ones):
- Scope boundaries: What is IN scope vs OUT of scope?
- Priority: If there are multiple features, which is most important?
- Technical constraints: Any specific technologies to use or avoid?
- Integration points: How should this integrate with existing code?
- User requirements: Any specific user-facing requirements mentioned?
- Data handling: How should data be stored/processed?
- Error handling: Any specific error handling requirements?
When to ask fewer questions:
- Goal is very specific and complete
- User provided detailed specifications
- Technical approach is obvious from context
Always offer: "Proceed with my best judgment" as an option.
Question Timeout Behavior:
- If no response within 5 minutes in autonomous mode: proceed with best judgment
- Log the assumed answers:
python .claude/core/state.py add-clarification "Question" "Assumed: best judgment" --phase pre-analysis --category scope - Always persist clarifications for session recovery
Wait for user response before continuing to Step 1.2.
Step 1.1.1: Persist Clarifications
After receiving user answers, ALWAYS persist them:
# For each question answered:
python .claude/core/state.py add-clarification "What payment provider?" "Stripe" --phase pre-analysis --category technical
python .claude/core/state.py add-clarification "Real-time tech?" "SignalR" --phase pre-analysis --category technical
This ensures clarifications survive session restarts.
Step 1.2: Initialize Workflow State
python .claude/core/state.py init "Goal: {User's goal}"
Step 1.3: Invoke Analyst Subagent
## Task for Analyst
{PLATFORM CONTEXT BLOCK}
**Goal:** {User's goal, incorporating clarifications}
Break this down into user stories with:
- Clear title
- Size estimate (S/M/L/XL)
- Acceptance criteria (testable)
- Security sensitivity flag (if applicable)
Parse analyst output and add stories to state:
python .claude/core/state.py add-story "Story title" --size M
python .claude/core/state.py add-story "Story title" --size L --security
# Repeat for each story
Step 1.4: Pre-Plan Clarification (REQUIRED CHECKPOINT)
BEFORE invoking the architect agent, you MUST review stories and ask for confirmation.
Use the AskUserQuestion tool:
## Pre-Plan Clarification Required
I've identified these stories from your request:
| # | Story | Size | Security |
|---|-------|------|----------|
| S1 | {title} | {size} | {Yes/No} |
| S2 | {title} | {size} | {Yes/No} |
...
Before I design the technical approach, please confirm:
Questions to ask:
- Story order: Is this the right priority order? Any changes?
- Technical preferences: Any specific patterns/libraries to use or avoid?
- Existing code concerns: Any areas I should be careful modifying?
- Missing stories: Anything I missed that should be included?
Wait for user response before continuing to Step 1.5.
Step 1.5: Invoke Architect Subagent
## Task for Architect
{PLATFORM CONTEXT BLOCK}
**Stories:** {List from analyst with user's priority adjustments}
Create technical design following:
- Platform project structure patterns
- Platform conventions
- Existing codebase patterns
For each story, identify:
- Files to create/modify
- Dependencies between stories
- Technical risks
Step 1.6: Gate G1 - Verify Design Complete
Check:
- All stories have technical design
- File changes identified per story
- Dependencies mapped
- No open questions blocking implementation
If not complete, iterate with architect.
Phase 2: Story Execution Loop
This is the main execution loop. It runs until ALL stories are completed.
CRITICAL RULES:
- Every iteration MUST run build and tests before completing
- No story can be marked complete until ALL verification checks pass
- The loop continues until there are no more incomplete stories
- The Stop hook will block exit if any stories remain incomplete
- TDD phases must be followed: RED → GREEN → REFACTOR → VERIFY
- Failures must be categorized for intelligent retry decisions
TDD Phase Enforcement
Each story MUST follow the TDD cycle. Track phases explicitly:
# 1. RED: Write failing test FIRST
python .claude/core/state.py tdd-phase S1 red
# Developer writes test that fails
# 2. GREEN: Minimum code to pass
python .claude/core/state.py tdd-phase S1 green
# Developer implements minimum code
# 3. REFACTOR: Clean up (optional for small changes)
python .claude/core/state.py tdd-phase S1 refactor
# Developer refactors without changing behavior
# 4. VERIFY: Run all checks
python .claude/core/state.py tdd-phase S1 verify
# All gates must pass
Before moving to GREEN phase, validate:
python .claude/core/state.py tdd-validate S1 green
# Returns: {"valid": true} or {"valid": false, "expected": "red"}
If developer tries to write implementation before tests:
STOP. TDD violation detected.
Current phase: none
Expected: red (write failing test first)
Developer must write a failing test before implementation.
Failure Categorization
When a gate fails, categorize the failure for intelligent retry:
# Auto-categorize based on error message:
python .claude/core/state.py add-failure S1 "Connection refused: database not available"
# Output: Added failure: F1 [infra]
# Or explicitly categorize:
python .claude/core/state.py add-failure S1 "API key not configured" --category external
Categories:
| Category | Description | Retry Strategy |
|---|---|---|
code |
Bug in implementation | Standard retry |
test |
Test itself is wrong | Fix test, retry |
infra |
DB, network, filesystem | Retry with backoff |
external |
API keys, external services | ESCALATE - needs human |
timeout |
Operation timed out | Retry with backoff |
Get retry recommendation before retrying:
python .claude/core/state.py retry-recommendation S1
# Returns: {"should_retry": true, "backoff_seconds": 30}
# OR: {"should_retry": false, "escalate": true, "reason": "External service failures"}
┌─────────────────────────────────────────────────────────────────────────────────┐
│ STORY EXECUTION LOOP │
│ │
│ while (incomplete_stories exist): │
│ story = get_next_incomplete_story() │
│ iteration = 0 │
│ previous_failures = [] │
│ │
│ while (story NOT completed AND iteration < MAX_RETRIES): │
│ iteration++ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 1. DEVELOP │ │
│ │ Invoke developer agent with previous_failures context │ │
│ │ │ │
│ │ 2. BUILD VERIFICATION (MANDATORY - Gate G2) │ │
│ │ Run: {platform.commands.build} │ │
│ │ If FAIL: add to previous_failures, CONTINUE to next iteration │ │
│ │ │ │
│ │ 3. TEST VERIFICATION (MANDATORY - Gate G3) │ │
│ │ Run: {platform.commands.test} │ │
│ │ If FAIL: add to previous_failures, CONTINUE to next iteration │ │
│ │ │ │
│ │ 4. TESTER AGENT │ │
│ │ Verify acceptance criteria │ │
│ │ If FAIL: add issues to previous_failures, CONTINUE │ │
│ │ │ │
│ │ 5. REVIEWER AGENT │ │
│ │ Code review and architecture compliance │ │
│ │ If CHANGES_REQUESTED: add to previous_failures, CONTINUE │ │
│ │ │ │
│ │ 6. SECURITY AGENT (if story.securitySensitive) │ │
│ │ Security analysis and scanning │ │
│ │ If REMEDIATION_NEEDED: add to previous_failures, CONTINUE │ │
│ │ │ │
│ │ 7. FINAL VERIFICATION (MANDATORY - Gate G5) │ │
│ │ Run: {platform.commands.build} && {platform.commands.test} │ │
│ │ If FAIL: add to previous_failures, CONTINUE │ │
│ │ If PASS: Mark story COMPLETED, commit, break inner loop │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ if (story NOT completed after MAX_RETRIES): │
│ Escalate as blocker, continue to next story │
│ │
│ Every 3 stories OR 30 minutes: Generate progress report │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Mandatory Verification Commands
Before ANY loop iteration can complete, run these platform commands:
# Get commands from platform.json (resolved dynamically)
BUILD_CMD=$(python .claude/core/platform.py get-command build)
TEST_CMD=$(python .claude/core/platform.py get-command test)
# Execute
eval "$BUILD_CMD"
eval "$TEST_CMD"
If either command fails, DO NOT:
- Exit the inner loop
- Mark the story as completed
- Proceed to the next story
- Create a PR
Instead:
- Log the failure
- Add to
previous_failurescontext - Retry with the developer agent
Retry Context Template
When retrying development, ALWAYS provide this context to the developer:
## Retry Context
**Story:** {story.id} - {story.title}
**Iteration:** {n}/{MAX_RETRIES}
**Previous Attempts:** {n-1}
### What Failed:
{For each failure:}
- **{failure_type}:** {description}
- File: {file}:{line} (if applicable)
- Error: {error_message}
### What To Fix:
{Specific actionable instructions derived from failures}
### Files Changed So Far:
{List of files modified in previous iterations}
### Verification Status:
| Check | Status |
|-------|--------|
| testsPass | {PASS/PENDING} |
| coverageMet | {PASS/PENDING} |
| reviewApproved | {PASS/PENDING} |
| securityCleared | {PASS/PENDING/N/A} |
### Retry Recommendation:
{Output from: python .claude/core/state.py retry-recommendation {story_id}}
Reviewer Severity Levels
When the Reviewer Agent identifies issues, they MUST be categorized by severity:
## Code Review Results
### BLOCKING Issues (Must Fix)
{Issues that block completion - security, correctness, architecture violations}
- **[BLOCKING]** SQL injection vulnerability in UserRepository.cs:45
- **[BLOCKING]** Missing null check causes crash in PaymentService
### SUGGESTIONS (Recommended)
{Improvements that should be made but don't block completion}
- **[SUGGESTION]** Consider using MediatR pipeline for validation
- **[SUGGESTION]** Extract magic number to named constant
### INFO (For Future Reference)
{Non-actionable observations}
- **[INFO]** This pattern could be simplified in future refactoring
- **[INFO]** Consider adding integration tests for edge cases
Retry Logic Based on Severity:
- BLOCKING issues found: MUST retry, add to
previous_failures - Only SUGGESTIONS: Mark review approved, note suggestions for developer
- Only INFO: Mark review approved, continue
External Dependency Detection
Before development, check for external dependencies that may require special handling:
# Check if story requires external services
# Look for keywords in acceptance criteria and design
External Dependency Types:
| Type | Detection Keywords | Handling |
|---|---|---|
| Payment APIs | stripe, paypal, payment, checkout | Use mock/sandbox in tests |
| Email Services | email, smtp, sendgrid, mailgun | Mock email sending |
| SMS/Push | sms, twilio, push notification | Mock notification service |
| OAuth/SSO | oauth, sso, google auth, azure ad | Use test credentials |
| Cloud Storage | s3, azure blob, gcs, file upload | Use local storage mock |
| External APIs | api key, third-party, integration | Create mock responses |
When external dependency detected:
Log dependency:
python .claude/core/state.py add-clarification "External dependency: Stripe API" "Using sandbox mode for tests" --phase development --category externalAdd mock strategy to developer context:
## External Dependencies This story requires: Stripe Payment API **Test Strategy:** - Use Stripe test mode API keys - Mock Stripe responses for unit tests - Mark AC requiring live Stripe as "MANUAL_VERIFICATION_REQUIRED"If no mock possible:
python .claude/core/state.py add-blocker "Stripe API credentials not available for testing" --severity high # Escalate to user
Progress Reports
Generate a progress report every 3 completed stories OR every 30 minutes:
## Progress Report
**Workflow:** {goal}
**Workflow ID:** {workflowId}
**Runtime:** {elapsed_time}
**Iteration:** {current_iteration}
### Stories Progress
| Status | Count |
|--------|-------|
| Completed | {X} |
| In Progress | {Y} |
| Pending | {Z} |
| Blocked | {B} |
### Recently Completed:
{For each recently completed story:}
- [{id}] {title} - {iterations} iterations
### Current Story:
- [{id}] {title}
- Status: {status}
- Iteration: {n}/{MAX_RETRIES}
- Verification: {checks summary}
### Blockers:
{List any blockers or "None"}
### Next Steps:
1. {Current action}
2. {Next story after this}
Phase 2.5: Lint Fix Loop (Non-Story Fixes)
When Gate G7 fails on lint but build/tests pass, enter this special loop:
Purpose: Fix lint/formatting issues without creating a new story iteration.
┌─────────────────────────────────────────────────────────────────────────────────┐
│ LINT FIX LOOP │
│ │
│ Gate G7 fails on lint (build + test passed) │
│ │
│ while (lint_fails AND attempts < 3): │
│ 1. Get lint errors: │
│ LINT_CMD=$(python .claude/core/platform.py get-command lint) │
│ eval "$LINT_CMD" 2>&1 | tee lint_errors.txt │
│ │
│ 2. Fix lint issues (DO NOT change logic): │
│ - Formatting only │
│ - No new features │
│ - No refactoring │
│ │
│ 3. Re-run verification: │
│ - Build (should still pass) │
│ - Test (should still pass) │
│ - Lint (check if fixed) │
│ │
│ 4. If lint passes: Exit loop, continue to Phase 3 │
│ If lint fails: Increment attempts, retry │
│ │
│ if (attempts >= 3): │
│ Log blocker: "Unable to fix lint issues after 3 attempts" │
│ Continue to Phase 3 anyway (lint is non-blocking for PR) │
│ Note in PR: "Lint issues remain - manual review needed" │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Key Rules for Lint Fix Loop:
- DO NOT modify any logic or behavior
- DO NOT add new tests or features
- ONLY fix formatting, whitespace, import order, etc.
- Verify tests still pass after each fix
- If tests break: Revert lint fix, continue with lint warnings
# Run lint fix loop
python .claude/core/state.py update-phase lint-fix
# Get lint command and run
LINT_CMD=$(python .claude/core/platform.py get-command lint)
eval "$LINT_CMD" 2>&1
# If platform has auto-fix:
LINT_FIX=$(python .claude/core/platform.py get-command lintFix 2>/dev/null || echo "")
if [ -n "$LINT_FIX" ]; then
eval "$LINT_FIX"
fi
# Verify tests still pass after lint fix
TEST_CMD=$(python .claude/core/platform.py get-command test)
eval "$TEST_CMD"
Phase 3: Completion & PR Creation
Step 3.1: Verify All Stories Complete
python .claude/core/state.py status
All stories must have status completed with all verification checks passed.
If any story is not completed, return to Phase 2.
Step 3.2: Final Build & Test Verification (MANDATORY - Gate G7)
Before creating a PR, run a COMPLETE verification cycle:
# Get commands from platform
BUILD_CMD=$(python .claude/core/platform.py get-command build)
TEST_CMD=$(python .claude/core/platform.py get-command test)
LINT_CMD=$(python .claude/core/platform.py get-command lint 2>/dev/null || echo "")
# Run all checks
eval "$BUILD_CMD"
eval "$TEST_CMD"
if [ -n "$LINT_CMD" ]; then
eval "$LINT_CMD"
fi
Output verification results:
## Pre-PR Verification (Gate G7)
| Check | Command | Result |
|-------|---------|--------|
| Build | `{build_cmd}` | {PASS/FAIL} |
| Tests | `{test_cmd}` | {PASS/FAIL} ({test_count} tests) |
| Lint | `{lint_cmd}` | {PASS/FAIL/N/A} |
{If ALL PASS}: All verification checks passed. Proceeding with PR creation.
{If ANY FAIL}: Verification failed. Returning to Phase 2 to fix issues.
If ANY check fails:
- DO NOT create the PR
- Return to Phase 2 to fix the issues
- Re-run verification after fixes
Step 3.3: Invoke DevOps (if needed)
If deployment configuration is required:
## Task for DevOps
{PLATFORM CONTEXT BLOCK}
Create/update deployment configuration for implemented features.
Step 3.4: Create Pull Request
# Ensure clean working state
git status
# Push branch
git push -u origin HEAD
# Create PR with comprehensive body
gh pr create --title "[Feature] {Goal Summary}" --body "$(cat <<'EOF'
## Summary
{Brief description of implementation}
## Stories Implemented
{For each story:}
- **{story.id}:** {story.title}
## Key Changes
{Major files/features changed, grouped logically}
## Testing
- All {test_count} tests passing
- Coverage thresholds met for all stories
## Verification Checklist
- [x] Build passes ({platform.commands.build})
- [x] All tests pass ({platform.commands.test})
- [x] Lint checks pass (if applicable)
- [x] Code review approved by reviewer agent
- [x] Security review passed (if applicable)
- [x] All acceptance criteria verified by tester agent
## Workflow Metrics
- **Workflow ID:** {workflowId}
- **Runtime:** {total_time}
- **Stories:** {completed}/{total}
- **Total Iterations:** {sum of all story iterations}
---
Generated with [Claude Code](https://claude.com/claude-code)
EOF
)"
Step 3.5: Complete Workflow State
python .claude/core/state.py complete
Step 3.6: Output Completion Marker (CRITICAL)
This marker is REQUIRED for the Stop hook to allow exit:
## WORKFLOW_COMPLETE
**Goal:** {goal}
**Status:** SUCCESS
**PR URL:** {pr_url}
**Workflow ID:** {workflowId}
**Runtime:** {total_time}
**Stories Completed:** {completed}/{total}
All stories verified. Pull request created. Workflow complete.
Architecture
ORCHESTRATOR (You) ─── Drives workflow through all phases
│
│ Phase 0
├── Platform Detection (dynamic discovery)
│
│ Phase 1
├── CLARIFY ──────▶ (Pre-Analysis Questions - REQUIRED)
├── analyst ──────▶ Requirements & user stories
├── CLARIFY ──────▶ (Pre-Plan Questions - REQUIRED)
├── architect ────▶ Technical design
│
│ Phase 2 (Loop until all complete)
├── developer ────▶ Implementation (TDD)
├── BUILD ────────▶ {platform.commands.build} (MANDATORY)
├── TEST ─────────▶ {platform.commands.test} (MANDATORY)
├── tester ───────▶ Acceptance verification
├── reviewer ─────▶ Code review
├── security ─────▶ Security audit (if flagged)
│
│ Phase 3
├── VERIFY ───────▶ Final build/test/lint (MANDATORY)
├── devops ───────▶ Infrastructure & deployment
└── PR ───────────▶ Create pull request + WORKFLOW_COMPLETE
State Management
Persistence Commands
# Initialize workflow
python .claude/core/state.py init "Goal description"
# Session recovery (ALWAYS run at session start)
python .claude/core/state.py recover
# Story management
python .claude/core/state.py add-story "Title" --size M [--security]
python .claude/core/state.py update-story S1 in_progress
python .claude/core/state.py update-story S1 testing
python .claude/core/state.py update-story S1 review
python .claude/core/state.py update-story S1 completed
# Verification checks (fail-first pattern)
python .claude/core/state.py verify S1 testsPass --passed
python .claude/core/state.py verify S1 coverageMet --passed --details "85%"
python .claude/core/state.py verify S1 reviewApproved --passed
python .claude/core/state.py verify S1 securityCleared --passed
# Progress tracking
python .claude/core/state.py status
python .claude/core/state.py progress --lines 20
python .claude/core/state.py compact-context
# Context management (for long runs)
python .claude/core/state.py trim-progress --lines 100
# Git recovery
python .claude/core/state.py mark-working-state
python .claude/core/state.py rollback-to-checkpoint
# User intervention
python .claude/core/state.py await-user-fix <blocker_id> "description" [--check-command "cmd"]
python .claude/core/state.py check-user-fix
python .claude/core/state.py user-fix-complete [--notes "note"]
python .claude/core/state.py resume-context
# Dependency scanning
python .claude/core/state.py scan-dependencies S1
# Completion
python .claude/core/state.py complete
State File Location
State persists in .claude/workflow-state.json. This enables:
- Session recovery after interruptions
- Progress tracking across long runs
- Stop hook completion detection
- Rollback capability per story
Long-Running Execution Strategies
Stop Hook Behavior
The Stop hook (stop.py) implements the Ralph Wiggum pattern:
- On exit attempt: Hook intercepts and checks for completion
- If WORKFLOW_COMPLETE not found: Block exit, provide continuation context
- If WORKFLOW_COMPLETE found: Allow normal exit
- Safety limit: Max iterations (default 100) prevents infinite loops
Context Management for Extended Runs
For Runs Under 1 Hour:
- Update state after each story completion
- Generate progress report every 3 stories
For Runs 1-4 Hours:
- Update state after each phase transition
- Generate progress report every 30 minutes
- Checkpoint review at 5-story intervals
- Trim progress file to 100 lines
For Runs 4+ Hours:
- Aggressive context management
- Keep only: current story (full) + completed stories (summary only)
- Drop: full implementation details, raw test output
- State sync every 15 minutes
- Consider natural breakpoints for session handoff
Recovery After Interruption
When resuming:
- Run
python .claude/core/state.py recover - Check git status for uncommitted changes
- Validate build/test before continuing
- Jump to current story/phase from state
Quality Gates
| Gate | When | Check | On Fail |
|---|---|---|---|
| G1 | Before dev | Design complete, AC clear | Clarify with analyst/architect |
| G2 | After dev | {platform.commands.build} passes |
Retry developer with build error |
| G3 | After build | {platform.commands.test} passes |
Retry developer with test failures |
| G4 | After test | Coverage threshold met | Add tests, retry |
| G5 | Before story complete | Final build + test | DO NOT complete story, retry |
| G6 | If security-sensitive | Security scan passes | Remediate, retry |
| G7 | Before PR | Full build + test + lint | DO NOT create PR, return to Phase 2 |
Clarifying Question Guidelines
Pre-Analysis Questions (Step 1.1)
Always ask when:
- Goal is vague or multi-part without priority
- Technical constraints not specified
- Integration requirements unclear
- User data handling involved but not specified
- Multiple valid interpretations exist
Ask fewer questions when:
- Requirements are explicit and complete
- User provided detailed specifications
- Context makes approach obvious
Pre-Plan Questions (Step 1.4)
Always ask when:
- Multiple valid technical approaches exist
- Story order could significantly impact development
- Dependencies between stories are complex
- User has shown strong opinions about technology
Ask fewer questions when:
- Standard patterns clearly apply
- Single obvious approach exists
- User explicitly requested autonomous execution
How to Ask
- Be specific - ask about concrete decisions
- Offer options when possible (2-4 choices)
- Provide your recommendation with rationale
- Always include "Proceed with your judgment" as an option
- Use the AskUserQuestion tool for structured questions
Escalation Triggers
| Trigger | Action |
|---|---|
| 3 failed iterations on same story | Log blocker, output ESCALATION_REQUIRED |
| Security vulnerability found | Immediate escalation |
| 5 stories completed | Optional checkpoint |
| External dependency unavailable | Log blocker, escalate |
| User intervention requested | Output HUMAN_INTERVENTION_NEEDED |
User Intervention Patterns
Mid-Workflow User Intervention (await_user_fix)
When the workflow encounters a blocker that requires manual user action (e.g., missing API keys, environment configuration, external service setup), use the await_user_fix pattern:
┌─────────────────────────────────────────────────────────────────────────────────┐
│ MID-WORKFLOW USER INTERVENTION │
│ │
│ 1. Workflow encounters blocker requiring user action │
│ 2. Call await_user_fix() to pause workflow │
│ 3. Stop hook allows exit with instructions for user │
│ 4. User fixes issue and signals completion │
│ 5. Workflow resumes from where it left off │
│ │
│ WORKFLOW USER WORKFLOW │
│ │ │ │ │
│ ├──▶ await_user_fix │ │ │
│ │ (exit allowed) │ │ │
│ │ │ │ │
│ │ ├──▶ Fix issue │ │
│ │ │ │ │
│ │ ├──▶ user-fix-complete │ │
│ │ │ │ │
│ │ │ (restart claude) │ │
│ │ │ ──────────────▶├──▶ Resume workflow │
│ │ │ │ │
└─────────────────────────────────────────────────────────────────────────────────┘
Invoking User Intervention:
# 1. Log the blocker first
python .claude/core/state.py add-blocker "Stripe API key not configured in .env" --severity high
# 2. Request user intervention with optional verification command
python .claude/core/state.py await-user-fix 0 "Add Stripe API key to .env file" --check-command "dotnet build" --timeout 60
# Output includes instructions for the user:
# {
# "status": "awaiting_user",
# "description": "Add Stripe API key to .env file",
# "check_command": "dotnet build",
# "instructions": [
# "1. Fix the issue: Add Stripe API key to .env file",
# "2. Run verification: dotnet build",
# "3. Resume workflow: python .claude/core/state.py user-fix-complete"
# ]
# }
User Signals Fix Complete:
# User runs this after fixing the issue
python .claude/core/state.py user-fix-complete --notes "Added test API key from Stripe dashboard"
Workflow Resumes:
On next session start, the workflow automatically detects the fix and continues:
# Session recovery detects resolved blocker
python .claude/core/state.py recover
# Get context for resuming
python .claude/core/state.py resume-context
Automated External Dependency Scanning
Before starting development on each story, scan for external dependencies:
# Scan story for external dependency keywords
python .claude/core/state.py scan-dependencies S1
# Output:
# Detected 2 external dependencies:
# [payment] payment_processing
# Keyword: stripe
# Mock: Use test/sandbox API keys or mock payment service
# Requires secrets: True
#
# [email] email_service
# Keyword: sendgrid
# Mock: Use local mailhog/papercut or mock email sender
# Requires secrets: True
Dependency Categories and Mock Strategies:
| Type | Keywords | Mock Strategy |
|---|---|---|
| payment | stripe, paypal, braintree | Test/sandbox API keys |
| sendgrid, mailgun, smtp | Local mailhog/papercut | |
| oauth | oauth, google auth, facebook login | Test OAuth credentials |
| storage | s3, azure blob, cloudinary | Local minio/azurite |
| sms | twilio, nexmo, vonage | SMS service test mode |
| ai_ml | openai, gpt-4, anthropic | Recorded responses |
| database_cloud | rds, cosmosdb, atlas | Local database container |
| messaging | rabbitmq, kafka, azure service bus | Local container |
When Dependencies Are Detected:
## External Dependencies Detected for Story {story_id}
| Type | Category | Requires Secrets |
|------|----------|------------------|
| {type} | {category} | {yes/no} |
### Mock Strategy for Development/Testing:
{For each dependency:}
- **{type}:** {mock_strategy}
### If Secrets Required:
1. Check if test credentials exist in environment
2. If not available: Call await-user-fix to request setup
3. Never hardcode credentials in code
Resume-After-Blocker Protocol
When resuming after a user-fixed blocker, follow this protocol:
# 1. Get full resume context
python .claude/core/state.py resume-context
# Returns:
# {
# "workflow_id": "abc123",
# "goal": "Implement payment processing",
# "current_phase": "development",
# "recently_resolved_blockers": [
# {"index": 0, "description": "Stripe API key", "resolvedAt": "..."}
# ],
# "current_story": {
# "id": "S2",
# "title": "Payment integration",
# "status": "in_progress",
# "tdd_phase": "green",
# "attempts": 2
# },
# "resume_instructions": [...]
# }
# 2. Verify the fix if a check command was specified
python .claude/core/state.py check-user-fix
# Returns: {"is_fixed": true, "verified": true, "can_resume": true}
# 3. Resume from current TDD phase
python .claude/core/state.py tdd-status S2
# Returns: "green" (was in middle of making tests pass)
# 4. Re-run tests to verify fix didn't break anything
BUILD_CMD=$(python .claude/core/platform.py get-command build)
TEST_CMD=$(python .claude/core/platform.py get-command test)
eval "$BUILD_CMD" && eval "$TEST_CMD"
# 5. Continue with normal story execution loop
Resume Context in Developer Prompt:
When resuming, include this context in the developer agent prompt:
## RESUMING AFTER BLOCKER
**Resolved Blocker:** {description}
**Resolution Notes:** {user's notes from user-fix-complete}
**Verification:** {PASSED/FAILED}
**Picking Up Where We Left Off:**
- Story: [{story_id}] {story_title}
- TDD Phase: {tdd_phase} (continue from here)
- Attempts: {attempts}
- Previous Failures: {summary of previous failures}
**First Step After Resume:**
1. Verify the fix works by running build/tests
2. If verified, continue from current TDD phase
3. If not verified, report new blocker
---
Continue implementing story {story_id}...
Stop Hook User Intervention Handling
The Stop hook (stop.py) handles user intervention specially:
- Before checking completion: Check for
awaiting_userstatus - If awaiting user: Allow exit with clear instructions
- User message includes:
- What needs to be fixed
- How to verify the fix
- Command to signal completion
- How to restart workflow
# Stop hook logic (simplified):
def main():
# Priority 1: Check for user intervention
is_awaiting, intervention = check_user_intervention(input_data)
if is_awaiting:
# Allow exit but provide instructions
return {'decision': 'allow', 'userMessage': get_intervention_instructions()}
# Priority 2: Check for completion
is_complete, reason = check_completion(input_data)
if is_complete:
return {'decision': 'allow'}
# Priority 3: Block exit, continue workflow
return {'decision': 'block', 'continuePrompt': get_continuation_context()}
Commands Reference
/workflow [goal]- Start full autonomous workflow/status- Check current progress/implement [story]- Implement single story with iteration loop/review [files]- Run code review
Summary: Key Behavioral Rules
- Never exit without WORKFLOW_COMPLETE marker (Stop hook enforces this)
- Always ask clarifying questions before analysis and before planning
- Never skip build/test verification - mandatory at every iteration
- Platform detection is dynamic - never hardcode platform names or commands
- Iterate on failures - use previous_failures context for retries
- Persist state - enable recovery from any interruption
- Progress reports - visibility every 3 stories or 30 minutes
- PR only after full verification - Gate G7 must pass
- Scan for external dependencies before developing each story
- Use await_user_fix for blockers requiring manual intervention
- Resume gracefully after user-fixed blockers with full context