name	agent-improvement
description	Self-improvement loop for multi-agent workflows. Diagnose failures, improve tool descriptions, and learn from success/failure patterns.
type	meta
priority	low

Agent Self-Improvement

Purpose

Enable continuous improvement of multi-agent workflows through:

Failure pattern analysis
Tool description optimization
Success pattern recognition
Performance benchmarking

Reference: Anthropic achieved 40% faster task completion through LLM-based tool description improvements.

Improvement Cycle

┌─────────────────────────────────────────────────┐
│                                                 │
│   1. COLLECT                                    │
│   └── Gather traces from completed sessions     │
│                                                 │
│   2. ANALYZE                                    │
│   └── Identify failure patterns & bottlenecks  │
│                                                 │
│   3. DIAGNOSE                                   │
│   └── Use LLM to understand root causes        │
│                                                 │
│   4. IMPROVE                                    │
│   └── Update tool descriptions & agent prompts │
│                                                 │
│   5. VALIDATE                                   │
│   └── Test improvements on similar tasks       │
│                                                 │
│   6. DEPLOY                                     │
│   └── Roll out to all agents                   │
│                                                 │
└─────────────────────────────────────────────────┘

Data Collection

Success/Failure Patterns

Store in .temp/improvement/patterns/:

{
  "pattern_id": "pat_001",
  "type": "failure|success",
  "frequency": 5,
  "context": {
    "task_type": "ui_component_creation",
    "agent": "mobile-ui-specialist",
    "phase": "implementation"
  },
  "description": "Agent often misses accessibility labels",
  "examples": [
    {
      "session_id": "sess_abc",
      "file": "StationCard.tsx",
      "issue": "Missing accessibilityLabel on TouchableOpacity"
    }
  ],
  "proposed_fix": "Add explicit reminder in agent prompt",
  "status": "identified|proposed|implemented|validated"
}

Tool Usage Patterns

{
  "tool": "read",
  "usage_count": 1523,
  "success_rate": 0.98,
  "avg_duration_ms": 45,
  "common_errors": [
    {
      "error": "File not found",
      "frequency": 23,
      "cause": "Path alias not resolved"
    }
  ],
  "improvement_opportunities": [
    "Add path alias resolution hint to tool description"
  ]
}

Analysis Operations

1. Failure Analysis

Input: Session traces with failures Output: Categorized failure patterns

## Failure Analysis Report

### Category 1: Agent Boundary Violations
- Frequency: 12 occurrences
- Pattern: UI agent attempting to modify services
- Root Cause: Task boundaries not clear in delegation
- Fix: Add explicit "DO NOT" list to delegation template

### Category 2: Missing Dependencies
- Frequency: 8 occurrences
- Pattern: UI agent starts before types available
- Root Cause: Dependency order not enforced
- Fix: Add dependency check before spawning

### Category 3: Tool Misuse
- Frequency: 5 occurrences
- Pattern: Using grep instead of read for known files
- Root Cause: Tool descriptions don't clarify when to use each
- Fix: Update tool descriptions with decision criteria

2. Bottleneck Analysis

Input: Session metrics Output: Performance bottlenecks

## Bottleneck Analysis

### Bottleneck 1: Sequential Agent Spawning
- Impact: 40% time overhead
- Pattern: Agents spawned one at a time
- Fix: Spawn independent agents in parallel

### Bottleneck 2: Excessive Iterations
- Impact: 2x token usage
- Pattern: Average 3.2 iterations per task
- Fix: Improve initial task decomposition

### Bottleneck 3: Quality Gate Failures
- Impact: 25% rework
- Pattern: TypeScript errors on first integration
- Fix: Add pre-integration type check

Improvement Actions

Tool Description Updates

Before:

Read: Reads a file from the filesystem

After:

Read: Reads a file from the filesystem.
- Use when you know the exact file path
- Prefer over grep for reading specific known files
- Use path aliases (@components, @services)
- Returns line-numbered content

Agent Prompt Updates

Before:

You are a mobile UI specialist...

After:

You are a mobile UI specialist...

CRITICAL REMINDERS:
- Always add accessibilityLabel to interactive elements
- Use memo() for components with complex props
- Check LINE_COLORS constant for subway line colors

Delegation Template Updates

Before:

### Task Boundaries
- DO NOT modify services

After:

### Task Boundaries (EXPLICIT)
Files you CAN modify:
- src/components/**
- src/screens/**

Files you CANNOT modify:
- src/services/** (backend agent)
- src/models/** (shared types)
- **/__tests__/** (test agent)

STOP if you need to modify excluded files.

Validation Protocol

Before Deployment

Identify test cases
- Find similar past tasks
- Create synthetic test scenarios
Run A/B comparison
- Original prompts vs improved prompts
- Measure: success rate, iterations, tokens, time
Quality threshold
- Must improve at least one metric
- Must not regress any metric by >5%

Validation Report

## Improvement Validation

### Change: Added accessibility reminder to mobile-ui-specialist

### Test Results
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Accessibility issues | 12% | 2% | -83% |
| Success rate | 88% | 96% | +9% |
| Token usage | 45K | 47K | +4% |

### Verdict: APPROVE
Accessibility issues reduced significantly with minimal token overhead.

Storage Structure

.temp/improvement/
├── patterns/
│   ├── failures/
│   │   └── pat_{id}.json
│   └── successes/
│       └── pat_{id}.json
├── proposals/
│   └── prop_{id}.md
├── validations/
│   └── val_{id}.json
└── history/
    └── {date}/
        └── changes.json

Integration with Workflow

Periodic Review (Weekly)

1. Aggregate traces from past week
2. Run failure analysis
3. Generate improvement proposals
4. Prioritize by impact × frequency
5. Implement top 3 improvements
6. Validate before merge

Continuous Learning (Per Session)

1. After each session:
   - If failed: Add to failure patterns
   - If succeeded but slow: Add to bottleneck analysis
   - If succeeded optimally: Add to success patterns

2. Check pattern thresholds:
   - If failure pattern frequency > 5: Trigger improvement proposal

Metrics to Track

Agent Performance

Metric	Target	Current	Trend
Success rate	>95%	92%	↑
Avg iterations	<2	2.3	→
Token efficiency	<80K	75K	↓
Time to complete	<10min	12min	↑

Improvement Impact

Change	Implemented	Impact
Accessibility reminder	2025-01-01	-83% issues
Tool description update	2025-01-02	+5% success
Delegation template	2025-01-03	-20% iterations

Best Practices

1. Small, Targeted Changes

One improvement at a time
Clear before/after comparison
Rollback plan ready

2. Data-Driven Decisions

Require frequency > 5 before acting
Validate with real tasks
Measure actual impact

3. Preserve What Works

Don't change successful patterns
Document why changes were made
Keep history for rollback

4. Human Review

Major changes require approval
Edge cases need human judgment
Balance automation with oversight

Quick Commands

# View failure patterns
cat .temp/improvement/patterns/failures/*.json | jq '.description'

# Count patterns by type
ls .temp/improvement/patterns/failures/ | wc -l

# View pending proposals
cat .temp/improvement/proposals/*.md

# Check improvement history
cat .temp/improvement/history/*/changes.json | jq '.'

Version: 1.0 | Last Updated: 2025-01-04

agent-improvement

Install Skill

SKILL.md

Agent Self-Improvement

Purpose

Improvement Cycle

Data Collection

Success/Failure Patterns

Tool Usage Patterns

Analysis Operations

1. Failure Analysis

2. Bottleneck Analysis

Improvement Actions

Tool Description Updates

Agent Prompt Updates

Delegation Template Updates

Validation Protocol

Before Deployment

Validation Report

Storage Structure

Integration with Workflow

Periodic Review (Weekly)

Continuous Learning (Per Session)

Metrics to Track

Agent Performance

Improvement Impact

Best Practices

1. Small, Targeted Changes

2. Data-Driven Decisions

3. Preserve What Works

4. Human Review

Quick Commands