bootstrap-loop - Agent Skills

name	bootstrap-loop
description	Coordinate the recursive improvement loop between Prompt Forge and Skill Forge with a frozen evaluation harness and auditable checkpoints.
allowed-tools	Read, Write, Edit, Bash, Glob, Grep, Task, TodoWrite
model	claude-3-5-sonnet
x-version	3.2.0
x-category	tooling
x-vcl-compliance	v3.1.1
x-cognitive-frames	HON, MOR, COM, CLS, EVD, ASP, SPC

L1 Improvement

Converted the bootstrap loop into an English-first SOP that mirrors the Prompt Architect + Skill Forge cadence.
Added structure-first guardrails (SKILL.md + README + examples/tests placeholders) and explicit eval-harness boundaries.
Made confidence ceilings and memory tagging mandatory to prevent silent drift.

STANDARD OPERATING PROCEDURE

Purpose

Run the recursive improvement cycle where Prompt Forge analyzes, Skill Forge rebuilds, and the frozen eval harness gates changes. Keep the harness immutable to avoid Goodhart drift.

Trigger Conditions

Positive: recursive improvement request, "run bootstrap loop", "cross-improve prompt/skill forge", "gate changes through eval harness".
Negative: single-skill edits without evaluation (route to skill-forge or prompt-architect individually).

Guardrails (Skill Forge-aligned)

Structure-first: keep SKILL.md + README.md + examples/ + tests/ up to date; log deviations.
Eval harness is frozen: no self-improvement of benchmarks or scoring.
Confidence ceilings required on every decision: inference/report 0.70, research 0.85, observation/definition 0.95.
Do not bypass auditors or regression gates; rollback if metrics regress.
Tag MCP memory with WHO=bootstrap-loop-{session}, WHY=skill-execution, PROJECT=<name>, WHEN=<iso>.

Execution Phases

Intent & Scope
- Confirm target (prompt-forge, skill-forge, or both), success criteria, and allowed blast radius.
- Load prior runs from memory for regression comparables.
Preparation
- Pull latest Prompt Architect/Skill Forge SOPs.
- Freeze eval suites and baselines; record hash.
Cycle Design
- Map phases: analyze → propose → build → test → decide → archive.
- Assign agents: Prompt Architect for constraint mining, Prompt Forge for proposal drafting, Skill Forge for build, eval-harness for gating.
Execution
- Analyze weaknesses and constraints; document evidence.
- Generate proposals with deltas and risk notes.
- Build/apply via Skill Forge; capture artifacts and diffs.
- Run frozen eval + regressions; store metrics and decision logs.
Decision & Rollback
- Accept only if metrics improve and safeguards intact; otherwise rollback and note blockers.
- Record confidence with ceiling syntax and rationale.
Delivery & Memory
- Archive reports, diffs, and evaluation artifacts.
- Update README/examples/tests pointers; store memory vectors for future runs.

Output Format

Cycle ID, target, scope, and baselines referenced.
Proposals applied/rejected with evidence and regression deltas.
Eval results (benchmark + regression) and decision (ACCEPT/REJECT/ROLLBACK).
Risks, follow-ups, and memory keys used.
Confidence: X.XX (ceiling: TYPE Y.YY) with short rationale.

Validation Checklist

Trigger matched; reroutes handled.
Eval harness hash recorded; no edits to frozen suites.
Proposals traced to evidence; regressions checked.
Memory tags applied and artifacts archived.
Confidence ceiling stated.

Integration Notes

Memory MCP (required): store executions/decisions/patterns under skills/tooling/bootstrap-loop/{project}/{timestamp}.
Hooks: pre/post targets in Skill Forge style (pre_hook_target_ms:20, post_hook_target_ms:1000) to keep loops responsive.

Confidence: 0.70 (ceiling: inference 0.70) - English-first rewrite aligns the bootstrap loop with Prompt Architect + Skill Forge guardrails and keeps the eval harness frozen.

Install Skill

SKILL.md