| name | writing-skills |
| description | Use when creating new skills, editing existing skills, or verifying skills work before deployment - applies TDD to process documentation by testing with subagents before writing, iterating until bulletproof against rationalization |
Writing Skills
Overview
Writing skills IS Test-Driven Development applied to process documentation.
Personal skills are written to ~/.claude/skills
You write test cases (pressure scenarios with subagents), watch them fail (baseline behavior), write the skill (documentation), watch tests pass (agents comply), and refactor (close loopholes).
Core principle: If you didn't watch an agent fail without the skill, you don't know if the skill teaches the right thing.
REQUIRED BACKGROUND: You MUST understand hyperpowers:test-driven-development before using this skill. That skill defines the fundamental RED-GREEN-REFACTOR cycle. This skill adapts TDD to documentation.
Official guidance: For Anthropic's official skill authoring best practices, see anthropic-best-practices.md. This document provides additional patterns and guidelines that complement the TDD-focused approach in this skill.
What is a Skill?
A skill is a reference guide for proven techniques, patterns, or tools. Skills help future Claude instances find and apply effective approaches.
Skills are: Reusable techniques, patterns, tools, reference guides
Skills are NOT: Narratives about how you solved a problem once
TDD Mapping for Skills
| TDD Concept | Skill Creation |
|---|---|
| Test case | Pressure scenario with subagent |
| Production code | Skill document (SKILL.md) |
| Test fails (RED) | Agent violates rule without skill (baseline) |
| Test passes (GREEN) | Agent complies with skill present |
| Refactor | Close loopholes while maintaining compliance |
| Write test first | Run baseline scenario BEFORE writing skill |
| Watch it fail | Document exact rationalizations agent uses |
| Minimal code | Write skill addressing those specific violations |
| Watch it pass | Verify agent now complies |
| Refactor cycle | Find new rationalizations → plug → re-verify |
The entire skill creation process follows RED-GREEN-REFACTOR.
When to Create a Skill
Create when:
- Technique wasn't intuitively obvious to you
- You'd reference this again across projects
- Pattern applies broadly (not project-specific)
- Others would benefit
Don't create for:
- One-off solutions
- Standard practices well-documented elsewhere
- Project-specific conventions (put in CLAUDE.md)
Skill Types
Technique
Concrete method with steps to follow (condition-based-waiting, hyperpowers:root-cause-tracing)
Pattern
Way of thinking about problems (flatten-with-flags, test-invariants)
Reference
API docs, syntax guides, tool documentation (office docs)
Directory Structure
skills/
skill-name/
SKILL.md # Main reference (required)
supporting-file.* # Only if needed
Flat namespace - all skills in one searchable namespace
Separate files for:
- Heavy reference (100+ lines) - API docs, comprehensive syntax
- Reusable tools - Scripts, utilities, templates
Keep inline:
- Principles and concepts
- Code patterns (< 50 lines)
- Everything else
SKILL.md Structure
Frontmatter (YAML):
- Only two fields supported:
nameanddescription - Max 1024 characters total
name: Use letters, numbers, and hyphens only (no parentheses, special chars)description: Third-person, includes BOTH what it does AND when to use it- Start with "Use when..." to focus on triggering conditions
- Include specific symptoms, situations, and contexts
- Keep under 500 characters if possible
---
name: Skill-Name-With-Hyphens
description: Use when [specific triggering conditions and symptoms] - [what the skill does and how it helps, written in third person]
---
# Skill Name
## Overview
What is this? Core principle in 1-2 sentences.
## When to Use
[Small inline flowchart IF decision non-obvious]
Bullet list with SYMPTOMS and use cases
When NOT to use
## Core Pattern (for techniques/patterns)
Before/after code comparison
## Quick Reference
Table or bullets for scanning common operations
## Implementation
Inline code for simple patterns
Link to file for heavy reference or reusable tools
## Common Mistakes
What goes wrong + fixes
## Real-World Impact (optional)
Concrete results
Claude Search Optimization (CSO)
Critical for discovery: Future Claude needs to FIND your skill
1. Rich Description Field
Purpose: Claude reads description to decide which skills to load for a given task. Make it answer: "Should I read this skill right now?"
Format: Start with "Use when..." to focus on triggering conditions, then explain what it does
Content:
- Use concrete triggers, symptoms, and situations that signal this skill applies
- Describe the problem (race conditions, inconsistent behavior) not language-specific symptoms (setTimeout, sleep)
- Keep triggers technology-agnostic unless the skill itself is technology-specific
- If skill is technology-specific, make that explicit in the trigger
- Write in third person (injected into system prompt)
# ❌ BAD: Too abstract, vague, doesn't include when to use
description: For async testing
# ❌ BAD: First person
description: I can help you with async tests when they're flaky
# ❌ BAD: Mentions technology but skill isn't specific to it
description: Use when tests use setTimeout/sleep and are flaky
# ✅ GOOD: Starts with "Use when", describes problem, then what it does
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently - replaces arbitrary timeouts with condition polling for reliable async tests
# ✅ GOOD: Technology-specific skill with explicit trigger
description: Use when using React Router and handling authentication redirects - provides patterns for protected routes and auth state management
2. Keyword Coverage
Use words Claude would search for:
- Error messages: "Hook timed out", "ENOTEMPTY", "race condition"
- Symptoms: "flaky", "hanging", "zombie", "pollution"
- Synonyms: "timeout/hang/freeze", "cleanup/teardown/afterEach"
- Tools: Actual commands, library names, file types
3. Descriptive Naming
Use active voice, verb-first:
- ✅
creating-skillsnotskill-creation - ✅
testing-skills-with-subagentsnotsubagent-skill-testing
4. Token Efficiency (Critical)
Problem: getting-started and frequently-referenced skills load into EVERY conversation. Every token counts.
Target word counts:
- getting-started workflows: <150 words each
- Frequently-loaded skills: <200 words total
- Other skills: <500 words (still be concise)
Techniques:
Move details to tool help:
# ❌ BAD: Document all flags in SKILL.md
search-conversations supports --text, --both, --after DATE, --before DATE, --limit N
# ✅ GOOD: Reference --help
search-conversations supports multiple modes and filters. Run --help for details.
Use cross-references:
# ❌ BAD: Repeat workflow details
When searching, dispatch subagent with template...
[20 lines of repeated instructions]
# ✅ GOOD: Reference other skill
Always use subagents (50-100x context savings). REQUIRED: Use [other-skill-name] for workflow.
Compress examples:
# ❌ BAD: Verbose example (42 words)
your human partner: "How did we handle authentication errors in React Router before?"
You: I'll search past conversations for React Router authentication patterns.
[Dispatch subagent with search query: "React Router authentication error handling 401"]
# ✅ GOOD: Minimal example (20 words)
Partner: "How did we handle auth errors in React Router?"
You: Searching...
[Dispatch subagent → synthesis]
Eliminate redundancy:
- Don't repeat what's in cross-referenced skills
- Don't explain what's obvious from command
- Don't include multiple examples of same pattern
Verification:
wc -w skills/path/SKILL.md
# getting-started workflows: aim for <150 each
# Other frequently-loaded: aim for <200 total
Name by what you DO or core insight:
- ✅
condition-based-waiting>async-test-helpers - ✅
using-skillsnotskill-usage - ✅
flatten-with-flags>data-structure-refactoring - ✅
hyperpowers:root-cause-tracing>debugging-techniques
Gerunds (-ing) work well for processes:
creating-skills,testing-skills,debugging-with-logs- Active, describes the action you're taking
4. Cross-Referencing Other Skills
When writing documentation that references other skills:
Use skill name only, with explicit requirement markers:
- ✅ Good:
**REQUIRED SUB-SKILL:** Use hyperpowers:test-driven-development - ✅ Good:
**REQUIRED BACKGROUND:** You MUST understand hyperpowers:debugging-with-tools - ❌ Bad:
See skills/testing/test-driven-development(unclear if required) - ❌ Bad:
@skills/testing/test-driven-development/SKILL.md(force-loads, burns context)
Why no @ links: @ syntax force-loads files immediately, consuming 200k+ context before you need them.
Flowchart Usage
digraph when_flowchart {
"Need to show information?" [shape=diamond];
"Decision where I might go wrong?" [shape=diamond];
"Use markdown" [shape=box];
"Small inline flowchart" [shape=box];
"Need to show information?" -> "Decision where I might go wrong?" [label="yes"];
"Decision where I might go wrong?" -> "Small inline flowchart" [label="yes"];
"Decision where I might go wrong?" -> "Use markdown" [label="no"];
}
Use flowcharts ONLY for:
- Non-obvious decision points
- Process loops where you might stop too early
- "When to use A vs B" decisions
Never use flowcharts for:
- Reference material → Tables, lists
- Code examples → Markdown blocks
- Linear instructions → Numbered lists
- Labels without semantic meaning (step1, helper2)
See @graphviz-conventions.dot for graphviz style rules.
Code Examples
One excellent example beats many mediocre ones
Choose most relevant language:
- Testing techniques → TypeScript/JavaScript
- System debugging → Shell/Python
- Data processing → Python
Good example:
- Complete and runnable
- Well-commented explaining WHY
- From real scenario
- Shows pattern clearly
- Ready to adapt (not generic template)
Don't:
- Implement in 5+ languages
- Create fill-in-the-blank templates
- Write contrived examples
You're good at porting - one great example is enough.
File Organization
Self-Contained Skill
defense-in-depth/
SKILL.md # Everything inline
When: All content fits, no heavy reference needed
Skill with Reusable Tool
condition-based-waiting/
SKILL.md # Overview + patterns
example.ts # Working helpers to adapt
When: Tool is reusable code, not just narrative
Skill with Heavy Reference
pptx/
SKILL.md # Overview + workflows
pptxgenjs.md # 600 lines API reference
ooxml.md # 500 lines XML structure
scripts/ # Executable tools
When: Reference material too large for inline
The Iron Law (Same as TDD)
NO SKILL WITHOUT A FAILING TEST FIRST
This applies to NEW skills AND EDITS to existing skills.
Write skill before testing? Delete it. Start over. Edit skill without testing? Same violation.
No exceptions:
- Not for "simple additions"
- Not for "just adding a section"
- Not for "documentation updates"
- Don't keep untested changes as "reference"
- Don't "adapt" while running tests
- Delete means delete
REQUIRED BACKGROUND: The hyperpowers:test-driven-development skill explains why this matters. Same principles apply to documentation.
Testing All Skill Types
For detailed methodology on testing different skill types (disciplines, techniques, patterns, reference), see: resources/testing-methodology.md
Testing Summary
Different skill types need different test approaches:
Discipline-Enforcing Skills (rules/requirements)
- Pressure types (time, sunk cost, authority, exhaustion)
- Plugging holes systematically
- Meta-testing techniques
Anti-Patterns
❌ Narrative Example
"In session 2025-10-03, we found empty projectDir caused..." Why bad: Too specific, not reusable
❌ Multi-Language Dilution
example-js.js, example-py.py, example-go.go Why bad: Mediocre quality, maintenance burden
❌ Code in Flowcharts
step1 [label="import fs"];
step2 [label="read file"];
Why bad: Can't copy-paste, hard to read
❌ Generic Labels
helper1, helper2, step3, pattern4 Why bad: Labels should have semantic meaning
STOP: Before Moving to Next Skill
After writing ANY skill, you MUST STOP and complete the deployment process.
Do NOT:
- Create multiple skills in batch without testing each
- Move to next skill before current one is verified
- Skip testing because "batching is more efficient"
The deployment checklist below is MANDATORY for EACH skill.
Deploying untested skills = deploying untested code. It's a violation of quality standards.
Skill Creation Checklist (TDD Adapted)
IMPORTANT: Use TodoWrite to create todos for EACH checklist item below.
RED Phase - Write Failing Test:
- Create pressure scenarios (3+ combined pressures for discipline skills)
- Run scenarios WITHOUT skill - document baseline behavior verbatim
- Identify patterns in rationalizations/failures
GREEN Phase - Write Minimal Skill:
- Name uses only letters, numbers, hyphens (no parentheses/special chars)
- YAML frontmatter with only name and description (max 1024 chars)
- Description starts with "Use when..." and includes specific triggers/symptoms
- Description written in third person
- Keywords throughout for search (errors, symptoms, tools)
- Clear overview with core principle
- Address specific baseline failures identified in RED
- Code inline OR link to separate file
- One excellent example (not multi-language)
- Run scenarios WITH skill - verify agents now comply
REFACTOR Phase - Close Loopholes:
- Identify NEW rationalizations from testing
- Add explicit counters (if discipline skill)
- Build rationalization table from all test iterations
- Create red flags list
- Re-test until bulletproof
Quality Checks:
- Small flowchart only if decision non-obvious
- Quick reference table
- Common mistakes section
- No narrative storytelling
- Supporting files only for tools or heavy reference
Deployment:
- Commit skill to git and push to your fork (if configured)
- Consider contributing back via PR (if broadly useful)
Discovery Workflow
How future Claude finds your skill:
- Encounters problem ("tests are flaky")
- Finds SKILL (description matches)
- Scans overview (is this relevant?)
- Reads patterns (quick reference table)
- Loads example (only when implementing)
Optimize for this flow - put searchable terms early and often.
The Bottom Line
Creating skills IS TDD for process documentation.
Same Iron Law: No skill without failing test first. Same cycle: RED (baseline) → GREEN (write skill) → REFACTOR (close loopholes). Same benefits: Better quality, fewer surprises, bulletproof results.
If you follow TDD for code, follow it for skills. It's the same discipline applied to documentation.