name	llms-txt-support
description	Detect and use llms.txt files for LLM-optimized documentation. Use when checking if a site has LLM-ready docs before scraping.
tools	Read, Write, WebFetch

llms.txt Support Skill

Purpose

Single responsibility: Detect, fetch, and utilize llms.txt files that provide LLM-optimized documentation, enabling 10x faster documentation ingestion. (BP-4)

Background

The llms.txt standard (https://llmstxt.org/) provides a convention for websites to expose LLM-friendly documentation. Instead of scraping entire sites, check for llms.txt first.

File hierarchy (check in order):

llms-full.txt - Complete documentation (largest)
llms.txt - Standard documentation
llms-small.txt - Condensed documentation (smallest)

Grounding Checkpoint (Archetype 1 Mitigation)

Before executing, VERIFY:

Base URL is accessible
Check all three llms.txt variants in order
Validate file content is actual documentation (not error page)
Confirm file size is reasonable for the documentation scope

DO NOT assume llms.txt exists. Always probe first.

Uncertainty Escalation (Archetype 2 Mitigation)

ASK USER instead of guessing when:

Multiple llms.txt variants found - which size to use?
llms.txt content appears partial or outdated
File returns but content seems like error page
Site has llms.txt but content doesn't match expected documentation

NEVER assume llms.txt quality without verification.

Context Scope (Archetype 3 Mitigation)

Context Type	Included	Excluded
RELEVANT	Target base URL, llms.txt content	Full site scraping
PERIPHERAL	llms.txt spec reference	Other sites' llms.txt
DISTRACTOR	Previous scraping attempts	Unrelated documentation

Workflow Steps

Step 1: Detect llms.txt (Grounding)

# Check for llms.txt variants (in order of preference)
curl -I https://example.com/llms-full.txt
curl -I https://example.com/llms.txt
curl -I https://example.com/llms-small.txt

# Check common alternate locations
curl -I https://example.com/.well-known/llms.txt
curl -I https://docs.example.com/llms.txt

Step 2: Validate Content

# Fetch and inspect first 100 lines
curl -s https://example.com/llms.txt | head -100

# Check file size
curl -sI https://example.com/llms.txt | grep -i content-length

# Verify it's not an error page
curl -s https://example.com/llms.txt | grep -i "not found\|error\|404" && echo "WARNING: May be error page"

Step 3: Choose Variant

Variant	Size	Use Case
`llms-full.txt`	Large (1MB+)	Complete documentation, full API reference
`llms.txt`	Medium	Standard use, balanced coverage
`llms-small.txt`	Small (<100KB)	Quick reference, limited context windows

Decision tree:

If context window is limited → llms-small.txt
If need complete coverage → llms-full.txt
Default → llms.txt

Step 4: Fetch and Process

# Download llms.txt
curl -o docs/llms.txt https://example.com/llms.txt

# Convert to skill format (if using skill-seekers)
skill-seekers scrape --llms-txt docs/llms.txt --name myskill

# Or process manually
# llms.txt is already LLM-optimized markdown
cp docs/llms.txt output/myskill/references/complete.md

Step 5: Validate Output

# Check content structure
head -50 output/myskill/references/complete.md

# Verify sections
grep "^#" output/myskill/references/complete.md | head -20

# Check for code examples
grep -c '```' output/myskill/references/complete.md

Recovery Protocol (Archetype 4 Mitigation)

On error:

PAUSE - Note which variant failed
DIAGNOSE - Check error type:
- 404 Not Found → Try next variant or alternate location
- 403 Forbidden → May need authentication or user-agent
- Timeout → Retry with longer timeout
- Invalid content → Fall back to traditional scraping
ADAPT - Try alternate approach
RETRY - Next variant (max 3 attempts per variant)
ESCALATE - Inform user llms.txt unavailable, suggest scraping

Checkpoint Support

State saved to: .aiwg/working/checkpoints/llms-txt-support/

checkpoints/llms-txt-support/
├── detection_results.json    # Which variants found
├── selected_variant.txt      # Which was chosen
└── content_hash.txt          # For cache validation

llms.txt Format Reference

Standard llms.txt structure:

# Project Name

> Brief description of the project

## Overview
[High-level explanation]

## Installation
[Setup instructions]

## Quick Start
[Getting started guide]

## API Reference
[Detailed API documentation]

## Examples
[Code examples]

## FAQ
[Common questions]

Detection Results Output

{
  "base_url": "https://example.com",
  "detected": {
    "llms-full.txt": {
      "found": true,
      "url": "https://example.com/llms-full.txt",
      "size": 1523456,
      "last_modified": "2025-01-15T10:30:00Z"
    },
    "llms.txt": {
      "found": true,
      "url": "https://example.com/llms.txt",
      "size": 245678,
      "last_modified": "2025-01-15T10:30:00Z"
    },
    "llms-small.txt": {
      "found": false
    }
  },
  "recommended": "llms.txt",
  "reason": "Standard size, good for most use cases"
}

Known Sites with llms.txt

Sites known to support llms.txt (verify before use):

Anthropic documentation
Many modern API documentation sites
Framework documentation following the standard

Always verify - this list may be outdated.

Troubleshooting

Issue	Diagnosis	Solution
No llms.txt found	Site doesn't support	Fall back to doc-scraper
Content seems wrong	Error page or redirect	Check actual content, verify URL
File too large	llms-full.txt overwhelming	Use llms.txt or llms-small.txt
Outdated content	llms.txt not maintained	Consider scraping + llms.txt merge

Integration with doc-scraper

If llms.txt is incomplete or outdated, combine approaches:

# 1. Fetch llms.txt as base
curl -o base.md https://example.com/llms.txt

# 2. Scrape for additional/updated content
skill-seekers scrape --config config.json --skip-covered-by base.md

# 3. Merge results
# llms.txt provides structure, scraping fills gaps

References

llms.txt Standard: https://llmstxt.org/
Skill Seekers llms.txt Detection: https://github.com/jmagly/Skill_Seekers/blob/main/docs/LLMS_TXT_SUPPORT.md
REF-001: Production-Grade Agentic Workflows (BP-4, BP-9)
REF-002: LLM Failure Modes (Archetype 1-4 mitigations)

llms-txt-support

Install Skill

SKILL.md