| name | autonomous-tasks |
| description | Execute tasks autonomously from a task queue with multi-context window support. Use when user requests autonomous mode, batch task execution, or needs to complete multiple tasks systematically. Handles task loading, execution, verification, and state persistence across context windows. |
| allowed-tools | Read, Write, Bash, Task |
Autonomous Tasks Skill
Purpose
This skill enables fully autonomous task execution where Claude:
- Loads tasks from
.claude/tasks/folder - Executes them one by one
- Physically verifies each implementation
- Saves state to memory before context refresh
- Continues across multiple context windows
- Only signs off when tasks are truly complete
Core Autonomous Behavior
Default to action, not suggestions. When in autonomous mode:
- Implement changes rather than only suggesting them
- Infer the most useful likely action and proceed
- Use tools to discover missing details instead of guessing
- Be persistent and complete tasks fully
- Save progress to memory frequently
Task File Format
Tasks are stored in .claude/tasks/ as markdown files:
# Task: [Brief title]
## Description
[What needs to be done]
## Acceptance Criteria
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3
## Verification Required
- [ ] Physical verification with screenshots
- [ ] UI interaction testing
- [ ] Build succeeds
- [ ] No regressions
## Context
[Any additional context, file paths, or requirements]
## Priority
high | normal | low
Example Task File
.claude/tasks/001-cost-dashboard-minimize.md:
# Task: Make Cost Dashboard Minimizable
## Description
The cost dashboard button in bottom-right corner should be minimizable.
Currently blocks text and cannot be minimized through pressing.
## Acceptance Criteria
- [ ] Button shows $ icon
- [ ] Button is in bottom-right corner above connectivity button
- [ ] Clicking button toggles dashboard visibility
- [ ] Dashboard hidden by default
- [ ] Button looks professional and HD quality
## Verification Required
- [ ] Screenshot shows button exists
- [ ] Screenshot before/after click shows toggle works
- [ ] UI audit confirms professional appearance
- [ ] Build succeeds with no errors
## Context
Files: src/components/CostDashboard.tsx, src/components/buttons.css
Style: Professional, clean, smooth, high-definition
## Priority
high
Autonomous Execution Workflow
Copy this checklist and track progress:
Autonomous Task Execution:
- [ ] Step 1: Load task queue from .claude/tasks/
- [ ] Step 2: Sort by priority (high > normal > low)
- [ ] Step 3: For each task:
- [ ] 3a. Read task file
- [ ] 3b. Understand requirements
- [ ] 3c. Implement solution
- [ ] 3d. Run build verification
- [ ] 3e. Physical verification with screenshots
- [ ] 3f. Mark acceptance criteria complete
- [ ] 3g. Save state to memory
- [ ] Step 4: Generate completion report
- [ ] Step 5: Physical verification of all changes
Step 1: Load Task Queue
import os
import glob
from pathlib import Path
tasks_dir = Path(".claude/tasks")
task_files = sorted(glob.glob(str(tasks_dir / "*.md")))
tasks = []
for task_file in task_files:
with open(task_file) as f:
content = f.read()
# Parse task metadata
tasks.append({
"file": task_file,
"content": content,
"status": "pending"
})
print(f"Loaded {len(tasks)} tasks")
Step 2: Sort by Priority
def get_priority(task_content):
"""Extract priority from task content."""
if "Priority" in task_content:
if "high" in task_content.lower():
return 0
elif "normal" in task_content.lower():
return 1
else:
return 2
return 1 # default to normal
tasks.sort(key=lambda t: get_priority(t["content"]))
Step 3: Execute Each Task
For each task in the queue:
3a. Read and Parse Task
task = tasks[0] # Current task
content = task["content"]
# Extract key sections
title = # Parse "# Task: ..." line
description = # Parse ## Description section
criteria = # Parse ## Acceptance Criteria checkboxes
verification = # Parse ## Verification Required
context = # Parse ## Context section
3b. Understand Requirements
Use extended thinking to fully understand:
- What needs to be built
- Why it's needed
- How to implement it
- What physical verification is required
- How to verify it actually works
3c. Implement Solution
Default to implementation:
- Read relevant files
- Make necessary changes
- Follow existing patterns
- Write clean, production-ready code
- Add comments where helpful
Use tools proactively:
- Read files to understand structure
- Grep for patterns to maintain consistency
- Edit files with exact replacements
- Write new files when needed
- Run bash commands for verification
3d. Build Verification
After implementation, verify build succeeds:
cd /home/runner/workspace && npm run build
If build fails:
- Read error messages carefully
- Fix the issues
- Rebuild
- Repeat until build succeeds
3e. Physical Verification
CRITICAL: Use the physical-verification skill:
from services.chatkit_backend.app.vision import register_and_verify_ui_change
result = await register_and_verify_ui_change(
change_id=task_id,
description=title,
files_modified=files_changed,
verification_criteria=[
{"element": criterion, "expected_state": expected}
for criterion in criteria
],
priority="high",
wait_for_verification=True
)
if not result["success"]:
# Fix issues and retry
# Do NOT mark task complete until verification passes
pass
3f. Mark Acceptance Criteria
Update task file with completion status:
# Update checkboxes from [ ] to [x]
updated_content = content.replace("- [ ] Criterion 1", "- [x] Criterion 1")
with open(task["file"], "w") as f:
f.write(updated_content)
3g. Save State to Memory
IMPORTANT: Before moving to next task, save state:
from services.chatkit_backend.app.llm.memory_tool import upsert_memory
await upsert_memory(
content=f"""
Autonomous Task Progress:
Completed: {task["file"]}
Status: VERIFIED
Files Modified: {files_changed}
Verification: {result}
Next Task: {tasks[1]["file"] if len(tasks) > 1 else "None"}
Remaining: {len(tasks) - 1} tasks
""",
context="autonomous_execution"
)
This ensures progress is not lost if context window refreshes.
Step 4: Generate Completion Report
After all tasks complete, create report:
# Autonomous Execution Report
## Summary
Completed X tasks with full physical verification
## Tasks Completed
1. Task 1 - VERIFIED
- Files: [list]
- Verification: [screenshot paths]
2. Task 2 - VERIFIED
- Files: [list]
- Verification: [screenshot paths]
## Build Status
✓ All builds successful
## Verification Status
✓ All features physically verified with computer vision
## Evidence
[Screenshot paths for all verifications]
## Next Steps
[Any follow-up tasks or recommendations]
Step 5: Final Comprehensive Audit
Run comprehensive audit of all changes:
from services.chatkit_backend.app.vision import AuditorAgent
auditor = AuditorAgent(api_key=os.getenv("ANTHROPIC_API_KEY"))
all_test_cases = []
for task in completed_tasks:
all_test_cases.extend(task["verification_criteria"])
result = await auditor.comprehensive_ui_audit(
feature_name="Autonomous Task Batch",
test_cases=all_test_cases,
max_iterations=30
)
Multi-Context Window Support
When approaching context limit, save state and prepare for refresh:
Before Context Refresh
# Save detailed state to memory
await upsert_memory(
content=f"""
# Autonomous Task State Checkpoint
## Current Task
{current_task_file}
## Progress
Completed: {len(completed_tasks)}
Remaining: {len(remaining_tasks)}
## Files Modified
{all_modified_files}
## Verification Results
{verification_summary}
## Next Actions
1. Continue with task: {next_task_file}
2. Load state from this memory
3. Resume execution
""",
context="autonomous_checkpoint"
)
# Move completed tasks to archive
for task in completed_tasks:
os.rename(
task["file"],
task["file"].replace("tasks", "tasks/completed")
)
After Context Refresh
# Load state from memory
from services.chatkit_backend.app.llm.memory_tool import search_memory
results = await search_memory(
query="Autonomous task state checkpoint",
context="autonomous_checkpoint"
)
# Parse state and resume
# Load remaining tasks
# Continue from where we left off
User Interface Integration
The autonomous mode should be controllable from UI:
Controls needed:
- Toggle autonomous mode ON/OFF
- Select task folder (.claude/tasks/)
- Configure verification strictness (always | optional | never)
- View task queue and progress
- Pause/resume execution
- Manual intervention button
Status display:
- Current task being executed
- Progress bar (X of Y tasks)
- Last verification result
- Build status
- Error messages if any
See implementation in src/components/AutonomousPanel.tsx
System Prompt for Autonomous Mode
When autonomous mode is active, the meta agent should have this system prompt:
<default_to_action>
By default, implement changes rather than only suggesting them. If the user's intent is unclear, infer the most useful likely action and proceed, using tools to discover any missing details instead of guessing.
</default_to_action>
<persistence>
Your context window will be automatically compacted as it approaches its limit, allowing you to continue working indefinitely. Save your current progress and state to memory before the context window refreshes. Always be as persistent and autonomous as possible and complete tasks fully.
</persistence>
<physical_verification>
NEVER claim a feature works without physical verification. Use computer vision to take screenshots and verify UI changes actually exist and function as expected. Code inspection and test passing are not sufficient.
</physical_verification>
<task_execution>
Execute tasks from .claude/tasks/ folder systematically:
1. Load all tasks
2. Sort by priority
3. For each task: implement, verify build, physically verify, mark complete
4. Save state to memory after each task
5. Generate completion report with evidence
</task_execution>
Error Handling
Build Failures
- Read error messages carefully
- Fix the specific issues mentioned
- Rebuild and verify
- Do NOT skip to next task until build passes
Verification Failures
- Take screenshot to see actual state
- Compare to expected state
- Identify root cause
- Fix the implementation
- Re-verify with computer vision
- Do NOT claim feature works without visual proof
Context Window Approaching Limit
- Save detailed state to memory immediately
- Mark current position in task queue
- Note which files were modified
- Allow context to refresh
- Load state from memory
- Resume from exact position
Anti-Patterns to Avoid
❌ Suggesting Instead of Implementing
"You could implement this by adding a button..." # NO
✓ Actually Implementing
[Reads files, makes changes, verifies build]
"Button implemented and verified" # YES
❌ Claiming Done Without Verification
"I've added the feature, it should work now" # NO
✓ Verification Before Claiming Done
[Takes screenshots, verifies with computer vision]
"Feature verified with visual evidence: [screenshot paths]" # YES
❌ Stopping at First Error
"Build failed, need help" # NO
✓ Fixing Errors Autonomously
[Reads error, identifies issue, fixes it, rebuilds]
"Build issue resolved, continuing" # YES
Integration with Meta Agent
The meta agent should:
- Detect when user requests autonomous mode
- Load this skill automatically
- Execute task queue systematically
- Use physical-verification skill for all UI changes
- Save state to memory throughout execution
- Handle context refreshes seamlessly
- Generate final report with evidence
Success Criteria
Autonomous execution is successful when:
- All tasks in queue are completed
- All builds pass
- All features physically verified with screenshots
- Evidence (screenshots) available for every claim
- State saved to memory at checkpoints
- Completion report generated
- User can see progress and results in UI
Next Steps
After completing autonomous task execution:
- Review completion report
- Check all screenshot evidence
- Run final comprehensive UI audit
- Update documentation
- Commit changes to git (if requested)
- Archive completed tasks
- Ready for next batch of tasks