| name | code-evolution |
| description | Autonomous multi-agent code evolution system for optimization problems. Use when solving complex optimization problems (packing, geometry, scheduling, search) through evolutionary approaches with multiple independent AI agents. Multi-start hybrid heuristic+SLSQP methods significantly outperform single approaches. Triggers include genetic algorithms, evolutionary optimization, multi-agent problem solving, parameter tuning at scale, AlphaEvolve-style research, or evolving code solutions across generations. |
Code Evolution
Architecture
orchestrator (you)
├── spawn agents (Task tool, subagent_type='general-purpose')
├── evaluate solutions (run evaluate.py)
├── manage archive (best solutions per generation)
└── plan next generation
Critical Principle: Agent Autonomy
NEVER write solution code yourself. You (the orchestrator) ONLY:
- Create the fixed evaluation harness (read-only for agents)
- Spawn autonomous subagents via Task tool
- Evaluate results using the harness
- Plan next generation based on results
Agents have full autonomy to implement their assigned approach. You don't guide their code - you guide their problem-solving strategy.
Workflow
Phase 0: Setup (Orchestrator Only)
Create the immutable harness - agents can ONLY use, never alter:
problems/<name>/problem.md- problem definition (READ-ONLY for agents)problems/<name>/evaluation/evaluate.py- evaluation function (FROZEN, not modifiable by agents)problems/<name>/config.json- benchmark, constraints, metadata
Agents receive paths to these files but cannot modify them.
Phase 1: Generation Loop (3-7 generations)
- Plan Strategies: Design 2-4 different approaches for agents to explore
- Spawn Agents: Use Task tool with
subagent_type='general-purpose'(15s timeout per agent)- Each agent gets problem description, their specific approach, and path to evaluator
- Agents write solutions to
generations/gen{N}/agent_{id}.py - Agents run themselves:
subprocess.run([sys.executable, agent_file]) - Output: JSON with
"score"and"circles"
- Evaluate: You run evaluator on agent outputs (agents cannot run this)
- Cross-Inspiration: Share winning ideas with next generation agents for inspiration
- Prune: Keep only the best 1-2 approaches from previous generation
- Archive: Store best solution to
generations/archive/
Phase 2: Cross-Inspiration & Pruning
Between generations:
- Reference winners: Show agents the best previous solution's strategy
- Prune dead approaches: Stop testing approaches that underperform
- Mix winning ideas: Combine best techniques from multiple agents
- Diversify within winners: Vary parameters (seeds, iteration counts, thresholds)
File Structure
problems/<name>/
├── problem.md
├── config.json
├── evaluation/evaluate.py
└── generations/
├── gen1/agent_*.py
└── archive/best_solution.py
Core Design Principles
Separation of Concerns
- Orchestrator role: Strategy planning, harness building, result evaluation, pruning
- Agent role: Implementation autonomy within their assigned strategy
- Harness: Frozen, read-only, immutable contract between them
Evolution Mechanics
- Diverse exploration (Gen 1-3): Different approaches find different optima
- Cross-inspiration (Gen 2+): Winning ideas inspire next generation
- Pruning (Gen 3+): Kill weak approaches, double down on winners
- Multi-start within winners: Vary parameters of proven strategies (+2-5% improvement)
- Validation first: Invalid solutions score 0 - harness is source of truth
Evolution Strategy
| Phase | Generations | Orchestrator Action |
|---|---|---|
| Explore | 1-3 | Spawn 3-4 agents with diverse strategies. Find winners. |
| Prune | After Gen 2-3 | Kill underperforming approaches. Keep 1-2 best. |
| Cross-Inspire | Before Gen 4+ | Share winning solution code/strategy with next agents. |
| Exploit | 4-5 | Spawn agents that refine/combine winning approaches. Vary seeds/params. |
| Polish | 6-7 | Multi-start within best approach. Push toward benchmark. |
Orchestrator Responsibilities
What YOU Do (Never Delegate)
- Create immutable evaluation harness (problem definition, evaluator, config)
- Spawn agents with Task tool
- Analyze results and plan next generation
- Prune: Decide which approaches to continue, which to kill
- Cross-inspire: Extract winning ideas and share with next agents
- Archive best solutions
What Agents Do (Full Autonomy)
- Implement their assigned strategy
- Write solution code
- Self-validate before output
- Run themselves and produce JSON output
Cross-Inspiration Strategy
After each generation, extract and communicate:
## What Worked
- Agent X achieved Y% with [strategy description]
- Key insight: [what made it work]
- Code reference: [location or snippet]
## What Failed
- Agent Z's [strategy] only achieved W%
- Likely issue: [root cause analysis]
- Don't repeat: [specific thing to avoid]
## Recommended Evolution
- Agents should build on: [winning strategy]
- Vary these parameters: [list of what to try]
- Combine techniques: [which ideas from multiple winners]
Agents use this to:
- Understand what works (cross-inspiration)
- Avoid dead ends (prune knowledge)
- Focus effort on proven directions
References
- Agent spawning: See references/agent-prompts.md
- Evaluator template: See references/evaluator-template.md
Adding New Problems
- Create
problems/<name>/problem.md(objective, constraints, benchmark, format) - Create
problems/<name>/config.json(benchmark value, metadata) - Create
problems/<name>/evaluation/evaluate.py(validate, score, evaluate functions)