name	autonomous-tasks
description	Execute tasks autonomously from a task queue with multi-context window support. Use when user requests autonomous mode, batch task execution, or needs to complete multiple tasks systematically. Handles task loading, execution, verification, and state persistence across context windows.
allowed-tools	Read, Write, Bash, Task

Autonomous Tasks Skill

Purpose

This skill enables fully autonomous task execution where Claude:

Loads tasks from .claude/tasks/ folder
Executes them one by one
Physically verifies each implementation
Saves state to memory before context refresh
Continues across multiple context windows
Only signs off when tasks are truly complete

Core Autonomous Behavior

Default to action, not suggestions. When in autonomous mode:

Implement changes rather than only suggesting them
Infer the most useful likely action and proceed
Use tools to discover missing details instead of guessing
Be persistent and complete tasks fully
Save progress to memory frequently

Task File Format

Tasks are stored in .claude/tasks/ as markdown files:

# Task: [Brief title]

## Description
[What needs to be done]

## Acceptance Criteria
- [ ] Criterion 1
- [ ] Criterion 2
- [ ] Criterion 3

## Verification Required
- [ ] Physical verification with screenshots
- [ ] UI interaction testing
- [ ] Build succeeds
- [ ] No regressions

## Context
[Any additional context, file paths, or requirements]

## Priority
high | normal | low

Example Task File

.claude/tasks/001-cost-dashboard-minimize.md:

# Task: Make Cost Dashboard Minimizable

## Description
The cost dashboard button in bottom-right corner should be minimizable.
Currently blocks text and cannot be minimized through pressing.

## Acceptance Criteria
- [ ] Button shows $ icon
- [ ] Button is in bottom-right corner above connectivity button
- [ ] Clicking button toggles dashboard visibility
- [ ] Dashboard hidden by default
- [ ] Button looks professional and HD quality

## Verification Required
- [ ] Screenshot shows button exists
- [ ] Screenshot before/after click shows toggle works
- [ ] UI audit confirms professional appearance
- [ ] Build succeeds with no errors

## Context
Files: src/components/CostDashboard.tsx, src/components/buttons.css
Style: Professional, clean, smooth, high-definition

## Priority
high

Autonomous Execution Workflow

Copy this checklist and track progress:

Autonomous Task Execution:
- [ ] Step 1: Load task queue from .claude/tasks/
- [ ] Step 2: Sort by priority (high > normal > low)
- [ ] Step 3: For each task:
  - [ ] 3a. Read task file
  - [ ] 3b. Understand requirements
  - [ ] 3c. Implement solution
  - [ ] 3d. Run build verification
  - [ ] 3e. Physical verification with screenshots
  - [ ] 3f. Mark acceptance criteria complete
  - [ ] 3g. Save state to memory
- [ ] Step 4: Generate completion report
- [ ] Step 5: Physical verification of all changes

Step 1: Load Task Queue

import os
import glob
from pathlib import Path

tasks_dir = Path(".claude/tasks")
task_files = sorted(glob.glob(str(tasks_dir / "*.md")))

tasks = []
for task_file in task_files:
    with open(task_file) as f:
        content = f.read()
        # Parse task metadata
        tasks.append({
            "file": task_file,
            "content": content,
            "status": "pending"
        })

print(f"Loaded {len(tasks)} tasks")

Step 2: Sort by Priority

def get_priority(task_content):
    """Extract priority from task content."""
    if "Priority" in task_content:
        if "high" in task_content.lower():
            return 0
        elif "normal" in task_content.lower():
            return 1
        else:
            return 2
    return 1  # default to normal

tasks.sort(key=lambda t: get_priority(t["content"]))

Step 3: Execute Each Task

For each task in the queue:

3a. Read and Parse Task

task = tasks[0]  # Current task
content = task["content"]

# Extract key sections
title = # Parse "# Task: ..." line
description = # Parse ## Description section
criteria = # Parse ## Acceptance Criteria checkboxes
verification = # Parse ## Verification Required
context = # Parse ## Context section

3b. Understand Requirements

Use extended thinking to fully understand:

What needs to be built
Why it's needed
How to implement it
What physical verification is required
How to verify it actually works

3c. Implement Solution

Default to implementation:

Read relevant files
Make necessary changes
Follow existing patterns
Write clean, production-ready code
Add comments where helpful

Use tools proactively:

Read files to understand structure
Grep for patterns to maintain consistency
Edit files with exact replacements
Write new files when needed
Run bash commands for verification

3d. Build Verification

After implementation, verify build succeeds:

cd /home/runner/workspace && npm run build

If build fails:

Read error messages carefully
Fix the issues
Rebuild
Repeat until build succeeds

3e. Physical Verification

CRITICAL: Use the physical-verification skill:

from services.chatkit_backend.app.vision import register_and_verify_ui_change

result = await register_and_verify_ui_change(
    change_id=task_id,
    description=title,
    files_modified=files_changed,
    verification_criteria=[
        {"element": criterion, "expected_state": expected}
        for criterion in criteria
    ],
    priority="high",
    wait_for_verification=True
)

if not result["success"]:
    # Fix issues and retry
    # Do NOT mark task complete until verification passes
    pass

3f. Mark Acceptance Criteria

Update task file with completion status:

# Update checkboxes from [ ] to [x]
updated_content = content.replace("- [ ] Criterion 1", "- [x] Criterion 1")

with open(task["file"], "w") as f:
    f.write(updated_content)

3g. Save State to Memory

IMPORTANT: Before moving to next task, save state:

from services.chatkit_backend.app.llm.memory_tool import upsert_memory

await upsert_memory(
    content=f"""
    Autonomous Task Progress:

    Completed: {task["file"]}
    Status: VERIFIED
    Files Modified: {files_changed}
    Verification: {result}

    Next Task: {tasks[1]["file"] if len(tasks) > 1 else "None"}
    Remaining: {len(tasks) - 1} tasks
    """,
    context="autonomous_execution"
)

This ensures progress is not lost if context window refreshes.

Step 4: Generate Completion Report

After all tasks complete, create report:

# Autonomous Execution Report

## Summary
Completed X tasks with full physical verification

## Tasks Completed
1. Task 1 - VERIFIED
   - Files: [list]
   - Verification: [screenshot paths]

2. Task 2 - VERIFIED
   - Files: [list]
   - Verification: [screenshot paths]

## Build Status
✓ All builds successful

## Verification Status
✓ All features physically verified with computer vision

## Evidence
[Screenshot paths for all verifications]

## Next Steps
[Any follow-up tasks or recommendations]

Step 5: Final Comprehensive Audit

Run comprehensive audit of all changes:

from services.chatkit_backend.app.vision import AuditorAgent

auditor = AuditorAgent(api_key=os.getenv("ANTHROPIC_API_KEY"))

all_test_cases = []
for task in completed_tasks:
    all_test_cases.extend(task["verification_criteria"])

result = await auditor.comprehensive_ui_audit(
    feature_name="Autonomous Task Batch",
    test_cases=all_test_cases,
    max_iterations=30
)

Multi-Context Window Support

When approaching context limit, save state and prepare for refresh:

Before Context Refresh

# Save detailed state to memory
await upsert_memory(
    content=f"""
    # Autonomous Task State Checkpoint

    ## Current Task
    {current_task_file}

    ## Progress
    Completed: {len(completed_tasks)}
    Remaining: {len(remaining_tasks)}

    ## Files Modified
    {all_modified_files}

    ## Verification Results
    {verification_summary}

    ## Next Actions
    1. Continue with task: {next_task_file}
    2. Load state from this memory
    3. Resume execution
    """,
    context="autonomous_checkpoint"
)

# Move completed tasks to archive
for task in completed_tasks:
    os.rename(
        task["file"],
        task["file"].replace("tasks", "tasks/completed")
    )

After Context Refresh

# Load state from memory
from services.chatkit_backend.app.llm.memory_tool import search_memory

results = await search_memory(
    query="Autonomous task state checkpoint",
    context="autonomous_checkpoint"
)

# Parse state and resume
# Load remaining tasks
# Continue from where we left off

User Interface Integration

The autonomous mode should be controllable from UI:

Controls needed:

Toggle autonomous mode ON/OFF
Select task folder (.claude/tasks/)
Configure verification strictness (always | optional | never)
View task queue and progress
Pause/resume execution
Manual intervention button

Status display:

Current task being executed
Progress bar (X of Y tasks)
Last verification result
Build status
Error messages if any

See implementation in src/components/AutonomousPanel.tsx

System Prompt for Autonomous Mode

When autonomous mode is active, the meta agent should have this system prompt:

<default_to_action>
By default, implement changes rather than only suggesting them. If the user's intent is unclear, infer the most useful likely action and proceed, using tools to discover any missing details instead of guessing.
</default_to_action>

<persistence>
Your context window will be automatically compacted as it approaches its limit, allowing you to continue working indefinitely. Save your current progress and state to memory before the context window refreshes. Always be as persistent and autonomous as possible and complete tasks fully.
</persistence>

<physical_verification>
NEVER claim a feature works without physical verification. Use computer vision to take screenshots and verify UI changes actually exist and function as expected. Code inspection and test passing are not sufficient.
</physical_verification>

<task_execution>
Execute tasks from .claude/tasks/ folder systematically:
1. Load all tasks
2. Sort by priority
3. For each task: implement, verify build, physically verify, mark complete
4. Save state to memory after each task
5. Generate completion report with evidence
</task_execution>

Error Handling

Build Failures

Read error messages carefully
Fix the specific issues mentioned
Rebuild and verify
Do NOT skip to next task until build passes

Verification Failures

Take screenshot to see actual state
Compare to expected state
Identify root cause
Fix the implementation
Re-verify with computer vision
Do NOT claim feature works without visual proof

Context Window Approaching Limit

Save detailed state to memory immediately
Mark current position in task queue
Note which files were modified
Allow context to refresh
Load state from memory
Resume from exact position

Anti-Patterns to Avoid

❌ Suggesting Instead of Implementing

"You could implement this by adding a button..."  # NO

✓ Actually Implementing

[Reads files, makes changes, verifies build]
"Button implemented and verified"  # YES

❌ Claiming Done Without Verification

"I've added the feature, it should work now"  # NO

✓ Verification Before Claiming Done

[Takes screenshots, verifies with computer vision]
"Feature verified with visual evidence: [screenshot paths]"  # YES

❌ Stopping at First Error

"Build failed, need help"  # NO

✓ Fixing Errors Autonomously

[Reads error, identifies issue, fixes it, rebuilds]
"Build issue resolved, continuing"  # YES

Integration with Meta Agent

The meta agent should:

Detect when user requests autonomous mode
Load this skill automatically
Execute task queue systematically
Use physical-verification skill for all UI changes
Save state to memory throughout execution
Handle context refreshes seamlessly
Generate final report with evidence

Success Criteria

Autonomous execution is successful when:

All tasks in queue are completed
All builds pass
All features physically verified with screenshots
Evidence (screenshots) available for every claim
State saved to memory at checkpoints
Completion report generated
User can see progress and results in UI

Next Steps

After completing autonomous task execution:

Review completion report
Check all screenshot evidence
Run final comprehensive UI audit
Update documentation
Commit changes to git (if requested)
Archive completed tasks
Ready for next batch of tasks

autonomous-tasks

Install Skill

SKILL.md