| name | llms-txt-support |
| description | Detect and use llms.txt files for LLM-optimized documentation. Use when checking if a site has LLM-ready docs before scraping. |
| tools | Read, Write, WebFetch |
llms.txt Support Skill
Purpose
Single responsibility: Detect, fetch, and utilize llms.txt files that provide LLM-optimized documentation, enabling 10x faster documentation ingestion. (BP-4)
Background
The llms.txt standard (https://llmstxt.org/) provides a convention for websites to expose LLM-friendly documentation. Instead of scraping entire sites, check for llms.txt first.
File hierarchy (check in order):
llms-full.txt- Complete documentation (largest)llms.txt- Standard documentationllms-small.txt- Condensed documentation (smallest)
Grounding Checkpoint (Archetype 1 Mitigation)
Before executing, VERIFY:
- Base URL is accessible
- Check all three llms.txt variants in order
- Validate file content is actual documentation (not error page)
- Confirm file size is reasonable for the documentation scope
DO NOT assume llms.txt exists. Always probe first.
Uncertainty Escalation (Archetype 2 Mitigation)
ASK USER instead of guessing when:
- Multiple llms.txt variants found - which size to use?
- llms.txt content appears partial or outdated
- File returns but content seems like error page
- Site has llms.txt but content doesn't match expected documentation
NEVER assume llms.txt quality without verification.
Context Scope (Archetype 3 Mitigation)
| Context Type | Included | Excluded |
|---|---|---|
| RELEVANT | Target base URL, llms.txt content | Full site scraping |
| PERIPHERAL | llms.txt spec reference | Other sites' llms.txt |
| DISTRACTOR | Previous scraping attempts | Unrelated documentation |
Workflow Steps
Step 1: Detect llms.txt (Grounding)
# Check for llms.txt variants (in order of preference)
curl -I https://example.com/llms-full.txt
curl -I https://example.com/llms.txt
curl -I https://example.com/llms-small.txt
# Check common alternate locations
curl -I https://example.com/.well-known/llms.txt
curl -I https://docs.example.com/llms.txt
Step 2: Validate Content
# Fetch and inspect first 100 lines
curl -s https://example.com/llms.txt | head -100
# Check file size
curl -sI https://example.com/llms.txt | grep -i content-length
# Verify it's not an error page
curl -s https://example.com/llms.txt | grep -i "not found\|error\|404" && echo "WARNING: May be error page"
Step 3: Choose Variant
| Variant | Size | Use Case |
|---|---|---|
llms-full.txt |
Large (1MB+) | Complete documentation, full API reference |
llms.txt |
Medium | Standard use, balanced coverage |
llms-small.txt |
Small (<100KB) | Quick reference, limited context windows |
Decision tree:
- If context window is limited →
llms-small.txt - If need complete coverage →
llms-full.txt - Default →
llms.txt
Step 4: Fetch and Process
# Download llms.txt
curl -o docs/llms.txt https://example.com/llms.txt
# Convert to skill format (if using skill-seekers)
skill-seekers scrape --llms-txt docs/llms.txt --name myskill
# Or process manually
# llms.txt is already LLM-optimized markdown
cp docs/llms.txt output/myskill/references/complete.md
Step 5: Validate Output
# Check content structure
head -50 output/myskill/references/complete.md
# Verify sections
grep "^#" output/myskill/references/complete.md | head -20
# Check for code examples
grep -c '```' output/myskill/references/complete.md
Recovery Protocol (Archetype 4 Mitigation)
On error:
- PAUSE - Note which variant failed
- DIAGNOSE - Check error type:
404 Not Found→ Try next variant or alternate location403 Forbidden→ May need authentication or user-agentTimeout→ Retry with longer timeoutInvalid content→ Fall back to traditional scraping
- ADAPT - Try alternate approach
- RETRY - Next variant (max 3 attempts per variant)
- ESCALATE - Inform user llms.txt unavailable, suggest scraping
Checkpoint Support
State saved to: .aiwg/working/checkpoints/llms-txt-support/
checkpoints/llms-txt-support/
├── detection_results.json # Which variants found
├── selected_variant.txt # Which was chosen
└── content_hash.txt # For cache validation
llms.txt Format Reference
Standard llms.txt structure:
# Project Name
> Brief description of the project
## Overview
[High-level explanation]
## Installation
[Setup instructions]
## Quick Start
[Getting started guide]
## API Reference
[Detailed API documentation]
## Examples
[Code examples]
## FAQ
[Common questions]
Detection Results Output
{
"base_url": "https://example.com",
"detected": {
"llms-full.txt": {
"found": true,
"url": "https://example.com/llms-full.txt",
"size": 1523456,
"last_modified": "2025-01-15T10:30:00Z"
},
"llms.txt": {
"found": true,
"url": "https://example.com/llms.txt",
"size": 245678,
"last_modified": "2025-01-15T10:30:00Z"
},
"llms-small.txt": {
"found": false
}
},
"recommended": "llms.txt",
"reason": "Standard size, good for most use cases"
}
Known Sites with llms.txt
Sites known to support llms.txt (verify before use):
- Anthropic documentation
- Many modern API documentation sites
- Framework documentation following the standard
Always verify - this list may be outdated.
Troubleshooting
| Issue | Diagnosis | Solution |
|---|---|---|
| No llms.txt found | Site doesn't support | Fall back to doc-scraper |
| Content seems wrong | Error page or redirect | Check actual content, verify URL |
| File too large | llms-full.txt overwhelming | Use llms.txt or llms-small.txt |
| Outdated content | llms.txt not maintained | Consider scraping + llms.txt merge |
Integration with doc-scraper
If llms.txt is incomplete or outdated, combine approaches:
# 1. Fetch llms.txt as base
curl -o base.md https://example.com/llms.txt
# 2. Scrape for additional/updated content
skill-seekers scrape --config config.json --skip-covered-by base.md
# 3. Merge results
# llms.txt provides structure, scraping fills gaps
References
- llms.txt Standard: https://llmstxt.org/
- Skill Seekers llms.txt Detection: https://github.com/jmagly/Skill_Seekers/blob/main/docs/LLMS_TXT_SUPPORT.md
- REF-001: Production-Grade Agentic Workflows (BP-4, BP-9)
- REF-002: LLM Failure Modes (Archetype 1-4 mitigations)