| name | bootstrap-loop |
| description | Coordinate the recursive improvement loop between Prompt Forge and Skill Forge with a frozen evaluation harness and auditable checkpoints. |
| allowed-tools | Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite |
| model | claude-3-5-sonnet |
| x-version | 3.2.0 |
| x-category | tooling |
| x-vcl-compliance | v3.1.1 |
| x-cognitive-frames | HON, MOR, COM, CLS, EVD, ASP, SPC |
L1 Improvement
- Converted the bootstrap loop into an English-first SOP that mirrors the Prompt Architect + Skill Forge cadence.
- Added structure-first guardrails (SKILL.md + README + examples/tests placeholders) and explicit eval-harness boundaries.
- Made confidence ceilings and memory tagging mandatory to prevent silent drift.
STANDARD OPERATING PROCEDURE
Purpose
Run the recursive improvement cycle where Prompt Forge analyzes, Skill Forge rebuilds, and the frozen eval harness gates changes. Keep the harness immutable to avoid Goodhart drift.
Trigger Conditions
- Positive: recursive improvement request, "run bootstrap loop", "cross-improve prompt/skill forge", "gate changes through eval harness".
- Negative: single-skill edits without evaluation (route to skill-forge or prompt-architect individually).
Guardrails (Skill Forge-aligned)
- Structure-first: keep
SKILL.md+README.md+examples/+tests/up to date; log deviations. - Eval harness is frozen: no self-improvement of benchmarks or scoring.
- Confidence ceilings required on every decision: inference/report 0.70, research 0.85, observation/definition 0.95.
- Do not bypass auditors or regression gates; rollback if metrics regress.
- Tag MCP memory with
WHO=bootstrap-loop-{session},WHY=skill-execution,PROJECT=<name>,WHEN=<iso>.
Execution Phases
- Intent & Scope
- Confirm target (
prompt-forge,skill-forge, orboth), success criteria, and allowed blast radius. - Load prior runs from memory for regression comparables.
- Confirm target (
- Preparation
- Pull latest Prompt Architect/Skill Forge SOPs.
- Freeze eval suites and baselines; record hash.
- Cycle Design
- Map phases: analyze → propose → build → test → decide → archive.
- Assign agents: Prompt Architect for constraint mining, Prompt Forge for proposal drafting, Skill Forge for build, eval-harness for gating.
- Execution
- Analyze weaknesses and constraints; document evidence.
- Generate proposals with deltas and risk notes.
- Build/apply via Skill Forge; capture artifacts and diffs.
- Run frozen eval + regressions; store metrics and decision logs.
- Decision & Rollback
- Accept only if metrics improve and safeguards intact; otherwise rollback and note blockers.
- Record confidence with ceiling syntax and rationale.
- Delivery & Memory
- Archive reports, diffs, and evaluation artifacts.
- Update README/examples/tests pointers; store memory vectors for future runs.
Output Format
- Cycle ID, target, scope, and baselines referenced.
- Proposals applied/rejected with evidence and regression deltas.
- Eval results (benchmark + regression) and decision (ACCEPT/REJECT/ROLLBACK).
- Risks, follow-ups, and memory keys used.
- Confidence: X.XX (ceiling: TYPE Y.YY) with short rationale.
Validation Checklist
- Trigger matched; reroutes handled.
- Eval harness hash recorded; no edits to frozen suites.
- Proposals traced to evidence; regressions checked.
- Memory tags applied and artifacts archived.
- Confidence ceiling stated.
Integration Notes
- Memory MCP (required): store
executions/decisions/patternsunderskills/tooling/bootstrap-loop/{project}/{timestamp}. - Hooks: pre/post targets in Skill Forge style (
pre_hook_target_ms:20,post_hook_target_ms:1000) to keep loops responsive.
Confidence: 0.70 (ceiling: inference 0.70) - English-first rewrite aligns the bootstrap loop with Prompt Architect + Skill Forge guardrails and keeps the eval harness frozen.