name	Agent Prompt Evolution
description	Track and optimize agent specialization during methodology development. Use when agent specialization emerges (generic agents show >5x performance gap), multi-experiment comparison needed, or methodology transferability analysis required. Captures agent set evolution (Aₙ tracking), meta-agent evolution (Mₙ tracking), specialization decisions (when/why to create specialized agents), and reusability assessment (universal vs domain-specific vs task-specific). Enables systematic cross-experiment learning and optimized M₀ evolution. 2-3 hours overhead per experiment.
allowed-tools	Read, Grep, Glob, Edit, Write

Agent Prompt Evolution

Systematically track how agents specialize during methodology development.

Specialized agents emerge from need, not prediction. Track their evolution to understand when specialization adds value.

When to Use This Skill

Use this skill when:

🔄 Agent specialization emerges: Generic agents show >5x performance gap
📊 Multi-experiment comparison: Want to learn across experiments
🧩 Methodology transferability: Analyzing what's reusable vs domain-specific
📈 M₀ optimization: Want to evolve base Meta-Agent capabilities
🎯 Specialization decisions: Deciding when to create new agents
📚 Agent library: Building reusable agent catalog

Don't use when:

❌ Single experiment with no specialization
❌ Generic agents sufficient throughout
❌ No cross-experiment learning goals
❌ Tracking overhead not worth insights

Quick Start (10 minutes per iteration)

Track Agent Evolution in Each Iteration

iteration-N.md template:

## Agent Set Evolution

### Current Agent Set (Aₙ)
1. **coder** (generic) - Write code, implement features
2. **doc-writer** (generic) - Documentation
3. **data-analyst** (generic) - Data analysis
4. **coverage-analyzer** (specialized, created iteration 3) - Analyze test coverage gaps

### Changes from Previous Iteration
- Added: coverage-analyzer (10x speedup for coverage analysis)
- Removed: None
- Modified: None

### Specialization Decision
**Why coverage-analyzer?**
- Generic data-analyst took 45 min for coverage analysis
- Identified 10x performance gap
- Coverage analysis is recurring task (every iteration)
- Domain knowledge: Go coverage tools, gap identification patterns
- **ROI**: 3 hours creation cost, saves 40 min/iteration × 3 remaining iterations = 2 hours saved

### Agent Reusability Assessment
- **coder**: Universal (100% transferable)
- **doc-writer**: Universal (100% transferable)
- **data-analyst**: Universal (100% transferable)
- **coverage-analyzer**: Domain-specific (testing methodology, 70% transferable to other languages)

### System State
- Aₙ ≠ Aₙ₋₁ (new agent added)
- System UNSTABLE (need iteration N+1 to confirm stability)

Four Tracking Dimensions

1. Agent Set Evolution (Aₙ)

Track changes iteration-to-iteration:

A₀ = {coder, doc-writer, data-analyst}
A₁ = {coder, doc-writer, data-analyst} (unchanged)
A₂ = {coder, doc-writer, data-analyst} (unchanged)
A₃ = {coder, doc-writer, data-analyst, coverage-analyzer} (new specialist)
A₄ = {coder, doc-writer, data-analyst, coverage-analyzer, test-generator} (new specialist)
A₅ = {coder, doc-writer, data-analyst, coverage-analyzer, test-generator} (stable)

Stability: Aₙ == Aₙ₋₁ for convergence

2. Meta-Agent Evolution (Mₙ)

Standard M₀ capabilities:

observe: Pattern observation
plan: Iteration planning
execute: Agent orchestration
reflect: Value assessment
evolve: System evolution

Track enhancements:

M₀ = {observe, plan, execute, reflect, evolve}
M₁ = {observe, plan, execute, reflect, evolve, gap-identify} (new capability)
M₂ = {observe, plan, execute, reflect, evolve, gap-identify} (stable)

Finding (from 8 experiments): M₀ sufficient in all cases (no evolution needed)

3. Specialization Decision Tree

When to create specialized agent:

Decision tree:
1. Is generic agent sufficient? (performance within 2x)
   YES → No specialization
   NO → Continue

2. Is task recurring? (happens ≥3 times)
   NO → One-off, tolerate slowness
   YES → Continue

3. Is performance gap >5x?
   NO → Tolerate moderate slowness
   YES → Continue

4. Is creation cost <ROI?
   Creation cost < (Time saved per use × Remaining uses)
   NO → Not worth it
   YES → Create specialized agent

Example (Bootstrap-002):

Task: Test coverage gap analysis
Generic agent (data-analyst): 45 min
Potential specialist (coverage-analyzer): 4.5 min (10x faster)

Recurring: YES (every iteration, 3 remaining)
Performance gap: 10x (>5x threshold)
Creation cost: 3 hours
ROI: (45-4.5) min × 3 = 121.5 min = 2 hours saved
Decision: CREATE (positive ROI)

4. Reusability Assessment

Three categories:

Universal (90-100% transferable):

Generic agents (coder, doc-writer, data-analyst)
No domain knowledge required
Applicable across all domains

Domain-Specific (60-80% transferable):

Requires domain knowledge (testing, CI/CD, error handling)
Patterns apply within domain
Needs adaptation for other domains

Task-Specific (10-30% transferable):

Highly specialized for particular task
One-off creation
Unlikely to reuse

Examples:

Agent: coverage-analyzer
Domain: Testing methodology
Transferability: 70%
- Go coverage tools (language-specific, 30% adaptation)
- Gap identification patterns (universal, 100%)
- Overall: 70% transferable to Python/Rust/TypeScript testing

Agent: test-generator
Domain: Testing methodology
Transferability: 40%
- Go test syntax (language-specific, 0% to other languages)
- Test pattern templates (moderately transferable, 60%)
- Overall: 40% transferable

Agent: log-analyzer
Domain: Observability
Transferability: 85%
- Log parsing (universal, 95%)
- Pattern recognition (universal, 100%)
- Structured logging concepts (universal, 100%)
- Go slog specifics (language-specific, 20%)
- Overall: 85% transferable

Evolution Log Template

Create agents/EVOLUTION-LOG.md:

# Agent Evolution Log

## Experiment Overview
- Domain: Testing Strategy
- Baseline agents: 3 (coder, doc-writer, data-analyst)
- Final agents: 5 (+coverage-analyzer, +test-generator)
- Specialization count: 2

---

## Iteration-by-Iteration Evolution

### Iteration 0
**Agent Set**: {coder, doc-writer, data-analyst}
**Changes**: None (baseline)
**Observations**: Generic agents sufficient for baseline establishment

### Iteration 3
**Agent Set**: {coder, doc-writer, data-analyst, coverage-analyzer}
**Changes**: +coverage-analyzer
**Reason**: 10x performance gap (45 min → 4.5 min)
**Creation Cost**: 3 hours
**ROI**: Positive (2 hours saved over 3 iterations)
**Reusability**: 70% (domain-specific, testing)

### Iteration 4
**Agent Set**: {coder, doc-writer, data-analyst, coverage-analyzer, test-generator}
**Changes**: +test-generator
**Reason**: 200x performance gap (manual test writing too slow)
**Creation Cost**: 4 hours
**ROI**: Massive (saved 10+ hours)
**Reusability**: 40% (task-specific, Go testing)

### Iteration 5
**Agent Set**: {coder, doc-writer, data-analyst, coverage-analyzer, test-generator}
**Changes**: None
**System**: STABLE (Aₙ == Aₙ₋₁)

---

## Specialization Analysis

### coverage-analyzer
**Purpose**: Analyze test coverage, identify gaps
**Performance**: 10x faster than generic data-analyst
**Domain**: Testing methodology
**Transferability**: 70%
**Lessons**: Coverage gap identification patterns are universal, tool integration is language-specific

### test-generator
**Purpose**: Generate test boilerplate from coverage gaps
**Performance**: 200x faster than manual
**Domain**: Testing methodology (Go-specific)
**Transferability**: 40%
**Lessons**: High speedup justified low transferability, patterns reusable but syntax is not

---

## Cross-Experiment Reuse

### From Previous Experiments
- **validation-builder** (from API design experiment) → Used for smoke test validation
- Reusability: Excellent (validation patterns are universal)
- Adaptation: Minimal (10 min to adapt from API to CI/CD context)

### To Future Experiments
- **coverage-analyzer** → Reusable for Python/Rust/TypeScript testing (70% transferable)
- **test-generator** → Less reusable (40% transferable, needs rewrite for other languages)

---

## Meta-Agent Evolution

### M₀ Capabilities
{observe, plan, execute, reflect, evolve}

### Changes
None (M₀ sufficient throughout)

### Observations
- M₀'s "evolve" capability successfully identified need for specialization
- No Meta-Agent evolution required
- Convergence: Mₙ == M₀ for all iterations

---

## Lessons Learned

### Specialization Decisions
- **10x performance gap** is good threshold (< 5x not worth it, >10x clear win)
- **Positive ROI required**: Creation cost must be justified by time savings
- **Recurring tasks only**: One-off tasks don't justify specialization

### Reusability Patterns
- **Generic agents always reusable**: coder, doc-writer, data-analyst (100%)
- **Domain agents moderately reusable**: coverage-analyzer (70%)
- **Task agents rarely reusable**: test-generator (40%)

### When NOT to Specialize
- Performance gap <5x (tolerable slowness)
- Task is one-off (no recurring benefit)
- Creation cost >ROI (not worth time investment)
- Generic agent will improve with practice (learning curve)

Cross-Experiment Analysis

After 3+ experiments, create agents/CROSS-EXPERIMENT-ANALYSIS.md:

# Cross-Experiment Agent Analysis

## Agent Reuse Matrix

| Agent | Exp1 | Exp2 | Exp3 | Reuse Rate | Transferability |
|-------|------|------|------|------------|-----------------|
| coder | ✓ | ✓ | ✓ | 100% | Universal |
| doc-writer | ✓ | ✓ | ✓ | 100% | Universal |
| data-analyst | ✓ | ✓ | ✓ | 100% | Universal |
| coverage-analyzer | ✓ | - | ✓ | 67% | Domain (testing) |
| test-generator | ✓ | - | - | 33% | Task-specific |
| validation-builder | - | ✓ | ✓ | 67% | Domain (validation) |
| log-analyzer | - | - | ✓ | 33% | Domain (observability) |

## Specialization Patterns

### Universal Agents (100% reuse)
- Generic capabilities (coder, doc-writer, data-analyst)
- No domain knowledge
- Always included in A₀

### Domain Agents (50-80% reuse)
- Require domain knowledge (testing, CI/CD, observability)
- Reusable within domain
- Examples: coverage-analyzer, validation-builder, log-analyzer

### Task Agents (10-40% reuse)
- Highly specialized
- One-off or rare reuse
- Examples: test-generator (Go-specific)

## M₀ Sufficiency

**Finding**: M₀ = {observe, plan, execute, reflect, evolve} sufficient in ALL experiments

**Implications**:
- No Meta-Agent evolution needed
- Base capabilities handle all domains
- Specialization occurs at Agent layer, not Meta-Agent layer

## Specialization Threshold

**Data** (from 3 experiments):
- Average performance gap for specialization: 15x (range: 5x-200x)
- Average creation cost: 3.5 hours (range: 2-5 hours)
- Average ROI: Positive in 8/9 cases (89% success rate)

**Recommendation**: Use 5x performance gap as threshold

---

**Updated**: After each new experiment

Success Criteria

Agent evolution tracking succeeded when:

Complete tracking: All agent changes documented each iteration
Specialization justified: Each specialized agent has clear ROI
Reusability assessed: Each agent categorized (universal/domain/task)
Cross-experiment learning: Patterns identified across 2+ experiments
M₀ stability documented: Meta-Agent evolution (or lack thereof) tracked

Related Skills

Parent framework:

methodology-bootstrapping - Core OCA cycle

Complementary:

rapid-convergence - Agent stability criterion

References

Core guide:

Evolution Tracking - Detailed tracking process
Specialization Decisions - Decision tree
Reusability Framework - Assessment rubric

Examples:

Bootstrap-002 Evolution - 2 specialists
Bootstrap-007 No Evolution - Generic sufficient

Status: ✅ Formalized | 2-3 hours overhead | Enables systematic learning

Agent Prompt Evolution

Install Skill

SKILL.md

Agent Prompt Evolution

When to Use This Skill

Quick Start (10 minutes per iteration)

Track Agent Evolution in Each Iteration

Four Tracking Dimensions

1. Agent Set Evolution (Aₙ)

2. Meta-Agent Evolution (Mₙ)

3. Specialization Decision Tree

4. Reusability Assessment

Evolution Log Template

Cross-Experiment Analysis

Success Criteria

Related Skills

References