| name | Prompt Manager |
| description | Optimize and manage AILANG teaching prompts for maximum conciseness and accuracy. Use when user asks to create/update prompts, optimize prompt length, or verify prompt accuracy. |
Prompt Manager
Mission: Create concise, accurate teaching prompts with maximum information density.
Core Principle: Token Efficiency
Target: ~4000 tokens per prompt (currently ~8000+) Strategy: Reference external docs, use tables, consolidate examples Validation: Maintain eval success rates while reducing prompt size
When to Use This Skill
Invoke when user mentions:
- "Create new prompt" / "update prompt" / "optimize prompt"
- "Make prompt more concise" / "reduce prompt length"
- "Fix prompt documentation" / "prompt-code mismatch"
- After implementing language features (keep prompt synchronized)
- Before eval baselines (verify accuracy)
Quick Reference Scripts
Create New Version
.claude/skills/prompt-manager/scripts/create_prompt_version.sh <new_version> <base_version> "<description>"
Creates versioned prompt file, computes hash, updates versions.json
Update Hash
.claude/skills/prompt-manager/scripts/update_hash.sh <version>
Recomputes SHA256 after edits
Verify Accuracy
.claude/skills/eval-analyzer/scripts/verify_prompt_accuracy.sh <version>
Catches prompt-code mismatches, false limitations
Check Examples Coverage
.claude/skills/prompt-manager/scripts/check_examples_coverage.sh <version>
Verifies that features used in working examples are documented in prompt
Analyze Size & Optimization Opportunities
.claude/skills/prompt-manager/scripts/analyze_prompt_size.sh prompts/v0.3.17.md
Shows: word count, section sizes, code blocks, tables, optimization opportunities
Test Prompt Effectiveness
.claude/skills/prompt-manager/scripts/test_prompt.sh v0.3.18
Runs AILANG-only eval (no Python) with dev models to test prompt effectiveness
Optimization Workflow
1. Analyze Current Prompt
.claude/skills/prompt-manager/scripts/analyze_prompt_size.sh prompts/v0.3.16.md
Sample output:
Total words: 4358 (target: <4000)
Total lines: 1214 (target: <200)
⚠️ OVER TARGET by 358 words (8%)
Code blocks: 60 (target: 5-10 comprehensive)
Table rows: 0 (target: 10+ tables)
Top sections by size:
719 words - Effect System
435 words - List Operations
368 words - Algebraic Data Types
High-ROI optimization areas identified by script:
- 60 code blocks → consolidate to 5-10 comprehensive examples
- 0 tables → convert builtin/syntax docs to tables
- Large sections → link details to external docs
2. Create Optimized Version
.claude/skills/prompt-manager/scripts/create_prompt_version.sh v0.3.17 v0.3.16 "Optimize for conciseness (-50% tokens)"
3. Apply Optimization Strategies
Reference resources/prompt_optimization.md for:
- Tables vs prose (builtin docs)
- Consolidating examples
- Linking to external docs
- Progressive disclosure patterns
Key techniques:
- Replace prose with tables - Builtin functions, syntax rules
- Consolidate examples - 8 comprehensive > 24 scattered
- Link to docs - Type system details → docs/guides/types.md
- Quick reference - 1-screen summary at top
- Remove redundancy - Historical notes → CHANGELOG.md
4. Validate Optimization
⚠️ CRITICAL: Must validate AFTER each optimization step!
# 1. CHECK ALL CODE EXAMPLES (NEW REQUIREMENT!)
# Extract and test every AILANG code block in the prompt
# This catches syntax errors that cause regressions
.claude/skills/prompt-manager/scripts/validate_all_code.sh prompts/v0.3.17.md
# 2. Check new size
.claude/skills/prompt-manager/scripts/analyze_prompt_size.sh prompts/v0.3.17.md
# 3. Verify accuracy (no false limitations)
.claude/skills/eval-analyzer/scripts/verify_prompt_accuracy.sh v0.3.17
# 4. Check examples coverage (NEW - v0.4.1+)
.claude/skills/prompt-manager/scripts/check_examples_coverage.sh v0.3.17
# Ensures working examples are documented in prompt
# 5. Update hash
.claude/skills/prompt-manager/scripts/update_hash.sh v0.3.17
# 6. TEST PROMPT EFFECTIVENESS (CRITICAL!)
.claude/skills/prompt-manager/scripts/test_prompt.sh v0.3.17
# This runs AILANG-only eval (no Python baseline) with dev models
# Target: >40% AILANG success rate
Success criteria:
- ✅ Token reduction: 10-20% per iteration (NOT >50% in one step!)
- ✅ AILANG success rate: >40% (if <40%, revert and try smaller optimization)
- ✅ All external links resolve
- ✅ No increase in compilation errors
- ✅ Examples still work in REPL
⚠️ If success rate drops >10%, REVERT and try smaller optimization
5. Document Optimization
Add header to optimized prompt:
---
Version: v0.3.17
Optimized: 2025-10-22
Token reduction: -54% (8200 → 3800 tokens)
Baseline: v0.3.16→v0.3.17 success rate maintained
---
6. Commit
git add prompts/v0.3.17.md prompts/versions.json
git commit -m "feat: Optimize v0.3.17 prompt for conciseness
- Reduced tokens: 8200 → 3800 (-54%)
- Builtin docs: prose → tables + reference ailang builtins list
- Examples: 24 scattered → 8 consolidated comprehensive
- Type system: moved details to docs/guides/types.md
- Added quick reference section at top
- Validated: eval success rate maintained"
Optimization Strategies (Summary)
Full guide: resources/prompt_optimization.md
Quick Wins
- Tables > Prose - Builtin docs, syntax rules (-67% tokens)
- Consolidate Examples - 8 comprehensive > 24 scattered (-56% tokens)
- Link to Docs - Move detailed explanations to external docs (-76% tokens)
- Quick Reference - 1-screen summary at top
- Remove Redundancy - Historical notes → CHANGELOG.md, implementation details → code links
Anti-Patterns
- ❌ Explaining "why" (move to design docs)
- ❌ Historical context (move to changelog)
- ❌ Implementation details (link to code)
- ❌ Verbose examples (show, don't tell)
- ❌ Apologetic limitations (be direct)
Optimization Checklist
- Token count <4000 words
- All external links resolve
- Examples work in REPL
- Eval baseline success rate maintained
- Hash updated in versions.json
- Optimization metrics documented in prompt header
Common Tasks
Detailed workflows: resources/workflow_guide.md
Fix False Limitation
Create version → Remove "❌ NO X" → Add "✅ X" with examples → Verify → Commit
Add Feature
Create version → Add to capabilities table → Add consolidated example → Verify → Commit
Optimize for Conciseness
Analyze size → Identify high-ROI sections → Apply techniques → Validate success rate → Document metrics → Commit
Progressive Disclosure
- Always loaded: skill.md (this file - workflow + optimization principles)
- Load for optimization: resources/prompt_optimization.md (detailed strategies)
- Load for workflows: resources/workflow_guide.md (detailed examples)
- Execute as needed: Scripts (create_prompt_version.sh, update_hash.sh)
Integration
- eval-analyzer: verify_prompt_accuracy.sh catches mismatches
- post-release: Run baselines after optimization
- ailang builtins list: Reference instead of duplicating
- docs/guides/: Link to instead of explaining
Success Metrics
Target prompt profile:
- Tokens: <4000 (~30-40% reduction from current, in 3 iterations)
- Lines: <300 (currently 500+)
- Examples: 40-50 (not <30!)
- Tables: 10+ for reference data
- AILANG success rate: >40%
⚠️ Lessons from v0.3.18 and v0.3.20 Failures
v0.3.18 Failure: Over-Optimization
What happened: Optimized v0.3.17 → v0.3.18 with -59% token reduction (5189 → 2126 words) Result: AILANG success rate collapsed to 4.8% (from expected ~40-60%)
Root causes:
- Too aggressive - removed >50% content in one step
- Over-consolidated - 64 → 21 examples (lost pattern variety)
- Tables replaced prose - lost explanatory context for syntax rules
- Removed negatives - "what NOT to do" examples are critical
- No incremental validation - didn't test after each change
v0.3.20 Failure: Incorrect Syntax in Examples
What happened: Prompt had 3 syntax errors: (1) match { | pattern => (wrong), (2) import "std/io" (wrong), (3) let (x, y) = tuple (wrong)
Result: -4.8% regression (40.0% → 35.2%), 18 benchmarks failed with PAR_001 compile errors
Root cause: No validation that code examples in prompt actually work with AILANG parser
Critical lessons:
- ❌ DON'T optimize >20% per iteration
- ❌ DON'T reduce examples below 40 total
- ❌ DON'T replace all syntax prose with tables
- ❌ DON'T link critical syntax to external docs (AIs can't follow links)
- ❌ DON'T skip eval testing between iterations
- ❌ DON'T trust code examples without testing them (NEW!)
- ✅ DO optimize incrementally (3 iterations of 10-15% each)
- ✅ DO keep negative examples ("what NOT to do")
- ✅ DO validate with test_prompt.sh after EACH change
- ✅ DO maintain pattern repetition (models need to see things 3-5 times)
- ✅ DO extract and test ALL code blocks in prompt (NEW!)
Full analysis: OPTIMIZATION_FAILURE_ANALYSIS.md