| name | skill-authoring |
| description | Use when creating new skills, editing existing skills, initializing skill structure, packaging skills for distribution, or verifying skills work before deployment. This skill applies TDD methodology to documentation. |
Skill Authoring
Overview
Writing skills IS Test-Driven Development applied to process documentation.
Write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md.
What is a Skill?
A skill is a modular, self-contained package that extends Claude's capabilities with specialized knowledge, workflows, and tools. Skills transform Claude from a general-purpose agent into a specialized agent equipped with procedural knowledge.
Skills are: Reusable techniques, patterns, tools, reference guides Skills are NOT: Narratives about how you solved a problem once
Skill Structure
skill-name/
├── SKILL.md (required)
│ ├── YAML frontmatter (name, description)
│ └── Markdown instructions
└── Bundled Resources (optional)
├── scripts/ - Executable code
├── references/ - Documentation loaded as needed
└── assets/ - Files used in output
Progressive Disclosure
Skills use a three-level loading system:
- Metadata (name + description) - Always in context (~100 words)
- SKILL.md body - When skill triggers (<5k words)
- Bundled resources - As needed by Claude
TDD Mapping for Skills
| TDD Concept | Skill Creation |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | Skill document (SKILL.md) |
| Test fails (RED) | Agent violates rule without skill |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes while maintaining compliance |
When to Create a Skill
Create when:
- Technique wasn't intuitively obvious
- You'd reference this again across projects
- Pattern applies broadly (not project-specific)
- Others would benefit
Don't create for:
- One-off solutions
- Standard practices well-documented elsewhere
- Project-specific conventions (put in CLAUDE.md)
Skill Creation Process
Step 1: Understand with Concrete Examples
Understand how the skill will be used through examples:
- "What functionality should this skill support?"
- "What would a user say that should trigger this skill?"
- "Can you give examples of how this skill would be used?"
Step 2: Plan Reusable Contents
Analyze each example to identify:
- Scripts that would be rewritten repeatedly
- References that would need rediscovery each time
- Assets that would be copied or modified
Step 3: Initialize the Skill
Run the initialization script:
python scripts/init_skill.py <skill-name> --path <output-directory>
The script creates:
- SKILL.md template with frontmatter
- Example directories: scripts/, references/, assets/
Step 4: Write with TDD (RED-GREEN-REFACTOR)
RED Phase - Baseline Testing:
- Run pressure scenarios WITHOUT the skill
- Document exact failures and rationalizations verbatim
- Identify which pressures trigger violations
GREEN Phase - Write Minimal Skill:
- Address specific baseline failures observed
- Run same scenarios WITH skill
- Verify agent now complies
REFACTOR Phase - Close Loopholes:
- Capture new rationalizations verbatim
- Add explicit counters for each loophole
- Build rationalization table
- Re-test until bulletproof
See @testing-skills-with-subagents.md for complete testing methodology.
Step 5: Package for Distribution
python scripts/package_skill.py <path/to/skill-folder> [output-directory]
The script validates and creates a distributable .skill file.
Step 6: Iterate
After testing on real tasks:
- Notice struggles or inefficiencies
- Identify needed updates
- Apply TDD cycle again
SKILL.md Structure
Frontmatter (YAML):
name: Hyphen-case, max 64 charactersdescription: Third-person, max 1024 characters, starts with "Use when..."
CRITICAL: Description = When to Use, NOT What the Skill Does
# ❌ BAD: Summarizes workflow - Claude may follow this instead of reading skill
description: Use when executing plans - dispatches subagent per task with code review
# ✅ GOOD: Just triggering conditions
description: Use when executing implementation plans with independent tasks
Body Structure:
# Skill Name
## Overview
Core principle in 1-2 sentences.
## When to Use
Symptoms and use cases (bullets)
## Core Pattern
Before/after comparison or workflow
## Quick Reference
Table for scanning common operations
## Common Mistakes
What goes wrong + fixes
Claude Search Optimization (CSO)
Rich Description Field
- Start with "Use when..."
- Include concrete triggers, symptoms, situations
- Write in third person
- NEVER summarize the skill's workflow
Keyword Coverage
- Error messages: "Hook timed out", "race condition"
- Symptoms: "flaky", "hanging", "zombie"
- Tools: Actual commands, library names
Token Efficiency
Target word counts:
- Frequently-loaded skills: <200 words
- Other skills: <500 words
Techniques:
- Reference --help for tool flags
- Cross-reference other skills
- Compress examples
- Eliminate redundancy
The Iron Law
NO SKILL WITHOUT A FAILING TEST FIRST
This applies to NEW skills AND EDITS to existing skills.
No exceptions:
- Not for "simple additions"
- Not for "just adding a section"
- Not for "documentation updates"
Bundled Resources
scripts/
Executable code for deterministic reliability or repeatedly rewritten tasks.
Available scripts:
scripts/init_skill.py- Initialize new skill from templatescripts/package_skill.py- Package skill for distributionscripts/quick_validate.py- Validate skill structure
references/
Documentation loaded as needed into context.
Available references:
references/validation-checklist.md- Complete validation checklistreferences/workflows.md- Workflow design patternsreferences/output-patterns.md- Output format patterns
assets/
Files used in output (templates, images, fonts).
Meta-Skill Derivatives
Meta-skills provide foundational patterns that language or domain-specific skills extend. This pattern enables knowledge reuse while allowing specialization.
Meta-Skill Hierarchy
meta-<pattern>-<focus> # Foundational patterns
└── <domain>-<pattern>-<focus> # Specialized derivative
Examples:
| Meta-Skill | Derivatives |
|---|---|
meta-sdk-patterns-eng |
aws-sdk-dev, stripe-sdk-dev |
meta-library-dev |
rust-lib-dev, python-lib-dev |
meta-cli-patterns-dev |
rust-cli-dev, go-cli-dev |
Creating a Derivative Skill
Step 1: Reference the Meta-Skill
In the derivative's SKILL.md, explicitly reference the parent:
## This Skill Extends
This skill extends `meta-library-dev` with Rust-specific patterns.
For foundational concepts (API design, versioning, testing strategies),
see the meta-skill first.
## This Skill Adds
- Rust-specific: Cargo.toml configuration, feature flags
- Rust idioms: Ownership patterns in public APIs
- Rust tooling: rustdoc, cargo publish, crates.io
Step 2: Determine What to Add
Derivatives should add domain-specific content in these categories:
| Category | What to Add |
|---|---|
| Tooling | Package managers, build tools, linters |
| Idioms | Language-specific patterns and conventions |
| Ecosystem | Common libraries, registries, community practices |
| Examples | Code samples in the target language |
| Gotchas | Language-specific pitfalls and workarounds |
Step 3: Avoid Duplication
Do NOT repeat content from the meta-skill. Instead:
- Reference sections: "See
meta-library-dev→ Versioning Strategy" - Extend with specifics: "In addition to semver (see meta-skill), Rust uses..."
- Override only when necessary: "Unlike the general pattern, Rust prefers..."
Derivative Naming Convention
<domain>-<pattern>-<focus>
Where:
- domain = language, platform, or service (rust, aws, k8s)
- pattern = what the skill covers (lib, sdk, cli, api)
- focus = audience (dev, ops, eng, nub, xec)
Valid derivative names:
rust-lib-dev(Rust library development)python-sdk-dev(Python SDK development)go-cli-ops(Go CLI operations)
Derivative Template
---
name: <domain>-<pattern>-<focus>
description: Use when developing <domain> <pattern>s. Extends meta-<pattern>-<focus> with <domain>-specific tooling, idioms, and ecosystem patterns.
---
# <Domain> <Pattern> Development
<Domain>-specific patterns for <pattern> development. This skill extends
`meta-<pattern>-<focus>` with <domain> tooling, idioms, and ecosystem practices.
## This Skill Extends
- `meta-<pattern>-<focus>` - Foundational <pattern> patterns
## This Skill Adds
- <Domain> tooling: [list tools]
- <Domain> idioms: [list patterns]
- <Domain> ecosystem: [list libraries, registries]
## <Domain>-Specific Tooling
[Package manager, build tools, testing frameworks]
## <Domain> Idioms
[Language-specific patterns that differ from general guidance]
## Ecosystem
[Common libraries, registries, community practices]
## Quick Reference
| Task | <Domain> Command |
|------|------------------|
| Create project | `<command>` |
| Build | `<command>` |
| Test | `<command>` |
| Publish | `<command>` |
## References
- `meta-<pattern>-<focus>` - Foundational patterns
- [<Domain> documentation](url)
When to Create Meta vs Derivative
| Create Meta-Skill When | Create Derivative When |
|---|---|
| Pattern applies across 3+ domains | Specializing for one domain |
| Core concepts are language-agnostic | Adding language-specific details |
| You're establishing a new pattern family | Extending existing meta-skill |
Additional Resources
Core Documentation
- anthropic-best-practices.md - Official Anthropic guidance
- testing-skills-with-subagents.md - TDD testing methodology
- persuasion-principles.md - Psychology of bulletproofing
Visualization
- graphviz-conventions.dot - Graphviz style rules
- render-graphs.js - Render skill flowcharts to SVG
Examples
- examples/CLAUDE_MD_TESTING.md - Full test campaign example
Common Rationalizations
| Excuse | Reality |
|---|---|
| "Skill is obviously clear" | Clear to you ≠ clear to agents. Test it. |
| "It's just a reference" | References can have gaps. Test retrieval. |
| "Testing is overkill" | Untested skills have issues. Always. |
| "I'll test if problems emerge" | Test BEFORE deploying. |
| "Too tedious to test" | Less tedious than debugging in production. |
Quick Reference
| Task | Command/Reference |
|---|---|
| Initialize skill | python scripts/init_skill.py <name> --path <path> |
| Validate skill | python scripts/quick_validate.py <skill_dir> |
| Package skill | python scripts/package_skill.py <skill_dir> |
| Testing methodology | @testing-skills-with-subagents.md |
| Validation checklist | @references/validation-checklist.md |
| Workflow patterns | @references/workflows.md |
| Output patterns | @references/output-patterns.md |
The Bottom Line
Creating skills IS TDD for process documentation.
Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.