| name | doc-splitter |
| description | Split large documentation (10K+ pages) into focused sub-skills with intelligent routing. Use for massive doc sites like Godot, AWS, or MSDN. |
| tools | Read, Write, Bash, Glob |
Documentation Splitter Skill
Purpose
Single responsibility: Split large documentation sites into multiple focused sub-skills with an optional router skill for intelligent navigation. (BP-4)
Grounding Checkpoint (Archetype 1 Mitigation)
Before executing, VERIFY:
- Total page count is known (run estimation first)
- Documentation categories are identifiable
- Target skill size determined (default: 5,000 pages per skill)
- Router strategy selected (category, size, or hybrid)
DO NOT split without understanding documentation structure.
Uncertainty Escalation (Archetype 2 Mitigation)
ASK USER instead of guessing when:
- Category boundaries unclear
- Optimal skill size uncertain for target use case
- Cross-references between sections complicate splitting
- Router vs flat structure decision needed
NEVER arbitrarily split - seek user guidance on boundaries.
Context Scope (Archetype 3 Mitigation)
| Context Type | Included | Excluded |
|---|---|---|
| RELEVANT | Doc structure, categories, page counts | Actual page content |
| PERIPHERAL | Similar large doc examples | Other documentation |
| DISTRACTOR | Content quality concerns | Individual page issues |
Size Guidelines
| Documentation Size | Recommendation | Strategy |
|---|---|---|
| < 5,000 pages | One skill | No splitting |
| 5,000 - 10,000 pages | Consider splitting | Category-based |
| 10,000 - 30,000 pages | Recommended | Router + Categories |
| 30,000+ pages | Strongly recommended | Router + Categories |
Workflow Steps
Step 1: Estimate Documentation Size (Grounding)
# Quick estimation with skill-seekers
skill-seekers estimate configs/large-docs.json
# Output:
# 📊 ESTIMATION RESULTS
# ✅ Pages Discovered: 28,450
# 📈 Estimated Total: 32,000
# ⏱️ Time Elapsed: 2.1 minutes
# 💡 Recommended: Split into 6-7 sub-skills
Step 2: Analyze Category Structure
# Identify natural category boundaries
skill-seekers analyze --config configs/large-docs.json --categories
# Output:
# Categories detected:
# - scripting: 8,200 pages
# - 2d: 5,400 pages
# - 3d: 9,100 pages
# - physics: 4,300 pages
# - networking: 2,800 pages
# - editor: 2,200 pages
Step 3: Choose Split Strategy
| Strategy | Best For | Description |
|---|---|---|
category |
Clear topic divisions | Split by documentation sections |
size |
Uniform distribution | Split every N pages |
router |
User navigation | Hub skill + specialized sub-skills |
hybrid |
Complex docs | Categories + size limits per category |
Step 4: Execute Split
Option A: With skill-seekers
# Category-based split
skill-seekers split --config configs/godot.json --strategy category
# Router-based split (recommended for large docs)
skill-seekers split --config configs/godot.json --strategy router
# Size-based split
skill-seekers split --config configs/godot.json --strategy size --pages-per-skill 5000
Option B: Manual split configuration
{
"name": "godot",
"max_pages": 40000,
"split_strategy": "router",
"split_config": {
"target_pages_per_skill": 5000,
"create_router": true,
"categories": {
"scripting": {
"patterns": ["/scripting/", "/gdscript/", "/c_sharp/"],
"max_pages": 8000
},
"2d": {
"patterns": ["/2d/", "/sprite/", "/tilemap/"],
"max_pages": 6000
},
"3d": {
"patterns": ["/3d/", "/mesh/", "/spatial/"],
"max_pages": 10000
},
"physics": {
"patterns": ["/physics/", "/collision/", "/rigidbody/"],
"max_pages": 5000
}
}
}
}
Step 5: Scrape Sub-Skills
# Scrape all sub-skills in parallel
for config in configs/godot-*.json; do
skill-seekers scrape --config $config &
done
wait
# Or sequentially with progress
for config in configs/godot-*.json; do
echo "Processing: $config"
skill-seekers scrape --config $config
done
Step 6: Generate Router Skill
# Auto-generate router from sub-skills
skill-seekers generate-router configs/godot-*.json
# Creates godot-router skill that intelligently routes queries
Step 7: Validate Split Results
# Check sub-skill sizes
for dir in output/godot-*/; do
echo "$dir: $(find $dir -name "*.md" | wc -l) files"
done
# Verify router coverage
cat output/godot-router/SKILL.md | grep -A 50 "## Sub-Skills"
Recovery Protocol (Archetype 4 Mitigation)
On error:
- PAUSE - Note which sub-skill failed
- DIAGNOSE - Check error type:
Category overlap→ Refine URL patternsUneven split→ Adjust page limitsOrphan pages→ Add catch-all categoryRouter incomplete→ Regenerate after all sub-skills done
- ADAPT - Modify split configuration
- RETRY - Re-split affected category (max 3 attempts)
- ESCALATE - Present split preview, ask user for boundary adjustments
Checkpoint Support
State saved to: .aiwg/working/checkpoints/doc-splitter/
checkpoints/doc-splitter/
├── estimation.json # Page count results
├── category_analysis.json # Category breakdown
├── split_plan.json # Planned split configuration
├── progress/
│ ├── godot-scripting.json
│ ├── godot-2d.json
│ └── ...
└── router_draft.md # Router skill draft
Output Structure
After splitting large documentation:
configs/
├── godot.json # Original config
├── godot-scripting.json # Generated sub-config
├── godot-2d.json
├── godot-3d.json
├── godot-physics.json
└── godot-router.json # Router config
output/
├── godot-scripting/ # Sub-skill
│ ├── SKILL.md
│ └── references/
├── godot-2d/ # Sub-skill
├── godot-3d/ # Sub-skill
├── godot-physics/ # Sub-skill
└── godot-router/ # Router skill
├── SKILL.md # Routing logic
└── references/
└── routing-table.md
Router Skill Structure
The generated router skill:
# Godot Documentation Router
## Purpose
Route queries to the appropriate specialized Godot sub-skill.
## Sub-Skills
| Topic | Skill | Coverage |
|-------|-------|----------|
| GDScript, C#, scripting patterns | godot-scripting | 8,200 pages |
| 2D graphics, sprites, tilemaps | godot-2d | 5,400 pages |
| 3D graphics, meshes, materials | godot-3d | 9,100 pages |
| Physics, collisions, rigid bodies | godot-physics | 4,300 pages |
## Routing Rules
1. **Scripting questions** → godot-scripting
- Keywords: script, gdscript, c#, function, variable, class
2. **2D graphics questions** → godot-2d
- Keywords: sprite, 2d, tilemap, animation2d, canvas
3. **3D graphics questions** → godot-3d
- Keywords: mesh, 3d, spatial, material, shader, camera3d
4. **Physics questions** → godot-physics
- Keywords: physics, collision, rigidbody, area, raycast
## Usage
Ask your question naturally. This router will direct you to the appropriate specialized skill.
Example:
- "How do I create a player movement script?" → godot-scripting
- "How do I set up tilemap collisions?" → godot-2d
- "How do I apply materials to a mesh?" → godot-3d
Troubleshooting
| Issue | Diagnosis | Solution |
|---|---|---|
| Uneven splits | Category size varies | Use hybrid strategy with max_pages |
| Orphan pages | URL patterns incomplete | Add catch-all or refine patterns |
| Router confusion | Overlapping keywords | Make routing rules more specific |
| Too many skills | Over-segmented | Merge related categories |
References
- Skill Seekers Large Documentation: https://github.com/jmagly/Skill_Seekers/blob/main/docs/LARGE_DOCUMENTATION.md
- REF-001: Production-Grade Agentic Workflows (BP-4, BP-9 KISS)
- REF-002: LLM Failure Modes (Archetype 3 context filtering, Archetype 4 recovery)