| name | skill-builder |
| description | Create, evaluate, and improve Agent skills to production quality (100/100). Use when the user wants to create a new skill, review an existing skill, score a skill against best practices, or improve a skill's quality. Also use when the user mentions skill development, skill templates, or skill optimization. |
Skill Builder Workflow
Create, evaluate, and improve Agent skills to production quality.
Quick Start
| Mode | When to Use | Starting Step |
|---|---|---|
| Create | Building a new skill from scratch | Step 1 |
| Evaluate | Scoring an existing skill | Step 4 |
| Improve | Upgrading a skill to 100/100 | Step 5 |
Skill Files
| File | Purpose |
|---|---|
SKILL.md |
This workflow |
SCORING.md |
Structure + Efficacy rubrics (MUST READ before scoring) |
TEMPLATES.md |
Starter templates and patterns (MUST READ before creating) |
EXAMPLES.md |
Before/after improvement examples |
CHECKLIST.md |
50-point validation checklist |
Mode 1: Create a New Skill
Step 1: Gather Requirements
Ask the user:
- What does the skill do? (core capability)
- When should it activate? (trigger contexts)
- What tools/scripts are needed? (dependencies)
- What's the expected output? (deliverables)
- What input quality issues are common? (see Input Decomposition below)
- What does this assume the user knows? (see User Assumptions below)
Input Decomposition
[!IMPORTANT] Most real-world inputs are messy. If the domain typically has vague, incomplete, or poorly-structured input, the skill MUST include a transformation step.
Ask: "What does bad input look like in this domain?"
| Input Quality | Skill Must Include |
|---|---|
| Usually clean and structured | No transformation needed |
| Sometimes vague or incomplete | Validation step that asks for clarification |
| Often messy or ambiguous | Decomposition step with probing questions to transform input |
Decomposition step pattern:
### Step N: Decompose Input
Transform raw input into structured form using these probes:
| Probe | Purpose |
|-------|---------|
| "What specifically happened?" | Extract concrete actions |
| "What was the outcome?" | Capture measurable results |
| "How often does this occur?" | Establish patterns |
User Capability Assumptions
List what the skill assumes the user can do. For each assumption, either:
- (a) Remove it by adding a compensating step, OR
- (b) Document it as a prerequisite
| Assumption | Compensation Strategy |
|---|---|
| User can provide structured input | Add decomposition step |
| User knows domain terminology | Add glossary or explain inline |
| User can make judgment calls | Add decision logic with explicit criteria |
| User knows quality standards | Add validation checklist |
Step 1.5: Identify the Hardest Parts
[!CRITICAL] State-of-the-art skills solve the hard problems, not just the easy ones. Before designing the workflow, identify where experts struggle and novices get stuck.
Ask: "What are the 2-3 hardest judgment calls in this domain?"
Signs of a hard judgment call:
- Experts disagree on the right answer
- Multiple valid options exist
- Context determines the best choice
- Novices consistently get it wrong
For each hard part, the skill MUST include:
| Hard Part Type | Required Solution |
|---|---|
| Ambiguous categorization | Disambiguation logic with explicit criteria |
| Quality/intensity judgment | Calibration guidance with thresholds |
| Context-dependent choice | Decision matrix or if/then rules |
| Subjective evaluation | Rubric with concrete examples |
Example pattern for disambiguation:
| If X could be A or B... | Ask this to disambiguate |
|-------------------------|--------------------------|
| [Ambiguous situation 1] | Was the emphasis on [criterion]? → A. On [other criterion]? → B |
| [Ambiguous situation 2] | Did it primarily [test for A] or [test for B]? |
[!WARNING] A lookup table is not disambiguation. If your skill has a reference table but no logic for handling cases that match multiple entries, it's incomplete.
Step 2: Assess Complexity & Choose Structure
[!CAUTION] Default to Simple. Only upgrade complexity if the skill genuinely needs it. Ask: "Would this skill work without this file?" If yes, don't add it.
Complexity Assessment:
| If the skill... | Then it's... |
|---|---|
| Does ONE thing, linear flow, no scripts, <5 decision points | Simple |
| Multi-step workflow, needs reference tables, moderate domain knowledge | Standard |
| Many conditionals, requires scripts, extensive domain expertise, high failure modes | Complex |
Structure by Complexity:
| Complexity | Structure |
|---|---|
| Simple | SKILL.md only |
| Standard | SKILL.md + REFERENCE.md or EXAMPLES.md |
| Complex | Above + TESTING.md + scripts/ |
[!TIP] Signs you're over-engineering:
- Adding TESTING.md with obvious scenarios ("it should work")
- Creating REFERENCE.md that repeats the workflow
- Writing EXAMPLES.md when 2 inline examples suffice
Read TEMPLATES.md for starter templates.
Step 3: Write the SKILL.md
Use templates from TEMPLATES.md. Ensure:
- Frontmatter — valid YAML with
name(must match folder name) anddescription - Description — includes BOTH what it does AND when to use it
- "Why?" line — one sentence after title explaining the problem this solves
- Workflow — clear, numbered steps
- Progressive disclosure — link to supporting files (only if needed)
[!TIP] Description is critical for discovery. Include multiple trigger keywords.
Step 3.a: Register the Skill
[!CRITICAL] Do NOT edit AGENTS.md manually.
- Run the
skills-index-updaterskill or script:python3 ~/.claude/skills/skills-index-updater/scripts/update_skill_index.py - Verify
AGENTS.mdcontains your new/updated skill.
After creating, proceed to Step 4 to evaluate.
Mode 2: Evaluate an Existing Skill
Step 4: Score the Skill
[!CRITICAL] Read SCORING.md completely before scoring. It contains both rubrics and scoring worksheets.
Process:
- Read all skill files (SKILL.md + supporting files)
- Score Structure (0-100): 9 categories — documentation completeness
- Score Efficacy (0-100): 6 categories — actual effectiveness
- Use Combined Score Matrix in SCORING.md for verdict
- Identify gaps in both dimensions
Present results using the format in SCORING.md.
If either score < 90, proceed to Step 5.
Mode 3: Improve to 100/100
Step 5: Plan Improvements
Based on evaluation, prioritize:
| Priority | Fixes | Target |
|---|---|---|
| P1 Critical | Missing frontmatter, invalid YAML, empty description | Required to function |
| P2 Important | Missing triggers, no examples, no progressive disclosure | Required for 95+ |
| P3 Polish | Missing troubleshooting, no quick start, terminology issues | Required for 100 |
Step 6: Execute Improvements
[!CAUTION] Get user approval before making changes. Present the plan and wait for confirmation.
Work systematically:
- Fix frontmatter first (skill won't load without valid YAML)
- Enhance description with trigger keywords
- Add progressive disclosure if SKILL.md > 200 lines
- Create supporting files as needed
- Add quality sections (Troubleshooting, Quick Start)
Step 7: Verify Final Score
- Re-read all skill files
- Re-score against both rubrics
- Confirm scores meet target
- Present final structure and summary
Validation Checklist (Quick)
Before declaring complete:
-
namein frontmatter matches folder name -
descriptionincludes what AND when - "Why?" line present after title
- SKILL.md under 500 lines
- Structure matches complexity (not over-engineered)
- Examples show concrete input/output
- Consistent terminology throughout
Full checklist: CHECKLIST.md
Troubleshooting
| Problem | Solution |
|---|---|
| Skill not discovered | Check description has trigger keywords |
| Low Structure score | Add missing sections per SCORING.md rubric |
| Low Efficacy score | Simplify — skill may be doing too many things |
| Frontmatter errors | Validate YAML syntax, check for reserved words |
| User confused by skill | Add Quick Start, improve decision density |
Reference
- SCORING.md — Structure + Efficacy rubrics with worksheets
- TEMPLATES.md — Starter templates and common patterns
- EXAMPLES.md — Before/after improvement examples
- CHECKLIST.md — 50-point validation checklist