Claude Code Plugins

Community-maintained marketplace

Feedback

agent-improvement

@k002bill2/LiveMetro
0
0

Self-improvement loop for multi-agent workflows. Diagnose failures, improve tool descriptions, and learn from success/failure patterns.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name agent-improvement
description Self-improvement loop for multi-agent workflows. Diagnose failures, improve tool descriptions, and learn from success/failure patterns.
type meta
priority low

Agent Self-Improvement

Purpose

Enable continuous improvement of multi-agent workflows through:

  • Failure pattern analysis
  • Tool description optimization
  • Success pattern recognition
  • Performance benchmarking

Reference: Anthropic achieved 40% faster task completion through LLM-based tool description improvements.

Improvement Cycle

┌─────────────────────────────────────────────────┐
│                                                 │
│   1. COLLECT                                    │
│   └── Gather traces from completed sessions     │
│                                                 │
│   2. ANALYZE                                    │
│   └── Identify failure patterns & bottlenecks  │
│                                                 │
│   3. DIAGNOSE                                   │
│   └── Use LLM to understand root causes        │
│                                                 │
│   4. IMPROVE                                    │
│   └── Update tool descriptions & agent prompts │
│                                                 │
│   5. VALIDATE                                   │
│   └── Test improvements on similar tasks       │
│                                                 │
│   6. DEPLOY                                     │
│   └── Roll out to all agents                   │
│                                                 │
└─────────────────────────────────────────────────┘

Data Collection

Success/Failure Patterns

Store in .temp/improvement/patterns/:

{
  "pattern_id": "pat_001",
  "type": "failure|success",
  "frequency": 5,
  "context": {
    "task_type": "ui_component_creation",
    "agent": "mobile-ui-specialist",
    "phase": "implementation"
  },
  "description": "Agent often misses accessibility labels",
  "examples": [
    {
      "session_id": "sess_abc",
      "file": "StationCard.tsx",
      "issue": "Missing accessibilityLabel on TouchableOpacity"
    }
  ],
  "proposed_fix": "Add explicit reminder in agent prompt",
  "status": "identified|proposed|implemented|validated"
}

Tool Usage Patterns

{
  "tool": "read",
  "usage_count": 1523,
  "success_rate": 0.98,
  "avg_duration_ms": 45,
  "common_errors": [
    {
      "error": "File not found",
      "frequency": 23,
      "cause": "Path alias not resolved"
    }
  ],
  "improvement_opportunities": [
    "Add path alias resolution hint to tool description"
  ]
}

Analysis Operations

1. Failure Analysis

Input: Session traces with failures Output: Categorized failure patterns

## Failure Analysis Report

### Category 1: Agent Boundary Violations
- Frequency: 12 occurrences
- Pattern: UI agent attempting to modify services
- Root Cause: Task boundaries not clear in delegation
- Fix: Add explicit "DO NOT" list to delegation template

### Category 2: Missing Dependencies
- Frequency: 8 occurrences
- Pattern: UI agent starts before types available
- Root Cause: Dependency order not enforced
- Fix: Add dependency check before spawning

### Category 3: Tool Misuse
- Frequency: 5 occurrences
- Pattern: Using grep instead of read for known files
- Root Cause: Tool descriptions don't clarify when to use each
- Fix: Update tool descriptions with decision criteria

2. Bottleneck Analysis

Input: Session metrics Output: Performance bottlenecks

## Bottleneck Analysis

### Bottleneck 1: Sequential Agent Spawning
- Impact: 40% time overhead
- Pattern: Agents spawned one at a time
- Fix: Spawn independent agents in parallel

### Bottleneck 2: Excessive Iterations
- Impact: 2x token usage
- Pattern: Average 3.2 iterations per task
- Fix: Improve initial task decomposition

### Bottleneck 3: Quality Gate Failures
- Impact: 25% rework
- Pattern: TypeScript errors on first integration
- Fix: Add pre-integration type check

Improvement Actions

Tool Description Updates

Before:

Read: Reads a file from the filesystem

After:

Read: Reads a file from the filesystem.
- Use when you know the exact file path
- Prefer over grep for reading specific known files
- Use path aliases (@components, @services)
- Returns line-numbered content

Agent Prompt Updates

Before:

You are a mobile UI specialist...

After:

You are a mobile UI specialist...

CRITICAL REMINDERS:
- Always add accessibilityLabel to interactive elements
- Use memo() for components with complex props
- Check LINE_COLORS constant for subway line colors

Delegation Template Updates

Before:

### Task Boundaries
- DO NOT modify services

After:

### Task Boundaries (EXPLICIT)
Files you CAN modify:
- src/components/**
- src/screens/**

Files you CANNOT modify:
- src/services/** (backend agent)
- src/models/** (shared types)
- **/__tests__/** (test agent)

STOP if you need to modify excluded files.

Validation Protocol

Before Deployment

  1. Identify test cases

    • Find similar past tasks
    • Create synthetic test scenarios
  2. Run A/B comparison

    • Original prompts vs improved prompts
    • Measure: success rate, iterations, tokens, time
  3. Quality threshold

    • Must improve at least one metric
    • Must not regress any metric by >5%

Validation Report

## Improvement Validation

### Change: Added accessibility reminder to mobile-ui-specialist

### Test Results
| Metric | Before | After | Change |
|--------|--------|-------|--------|
| Accessibility issues | 12% | 2% | -83% |
| Success rate | 88% | 96% | +9% |
| Token usage | 45K | 47K | +4% |

### Verdict: APPROVE
Accessibility issues reduced significantly with minimal token overhead.

Storage Structure

.temp/improvement/
├── patterns/
│   ├── failures/
│   │   └── pat_{id}.json
│   └── successes/
│       └── pat_{id}.json
├── proposals/
│   └── prop_{id}.md
├── validations/
│   └── val_{id}.json
└── history/
    └── {date}/
        └── changes.json

Integration with Workflow

Periodic Review (Weekly)

1. Aggregate traces from past week
2. Run failure analysis
3. Generate improvement proposals
4. Prioritize by impact × frequency
5. Implement top 3 improvements
6. Validate before merge

Continuous Learning (Per Session)

1. After each session:
   - If failed: Add to failure patterns
   - If succeeded but slow: Add to bottleneck analysis
   - If succeeded optimally: Add to success patterns

2. Check pattern thresholds:
   - If failure pattern frequency > 5: Trigger improvement proposal

Metrics to Track

Agent Performance

Metric Target Current Trend
Success rate >95% 92%
Avg iterations <2 2.3
Token efficiency <80K 75K
Time to complete <10min 12min

Improvement Impact

Change Implemented Impact
Accessibility reminder 2025-01-01 -83% issues
Tool description update 2025-01-02 +5% success
Delegation template 2025-01-03 -20% iterations

Best Practices

1. Small, Targeted Changes

  • One improvement at a time
  • Clear before/after comparison
  • Rollback plan ready

2. Data-Driven Decisions

  • Require frequency > 5 before acting
  • Validate with real tasks
  • Measure actual impact

3. Preserve What Works

  • Don't change successful patterns
  • Document why changes were made
  • Keep history for rollback

4. Human Review

  • Major changes require approval
  • Edge cases need human judgment
  • Balance automation with oversight

Quick Commands

# View failure patterns
cat .temp/improvement/patterns/failures/*.json | jq '.description'

# Count patterns by type
ls .temp/improvement/patterns/failures/ | wc -l

# View pending proposals
cat .temp/improvement/proposals/*.md

# Check improvement history
cat .temp/improvement/history/*/changes.json | jq '.'

Version: 1.0 | Last Updated: 2025-01-04