Claude Code Plugins

Community-maintained marketplace

Feedback

llms-txt-support

@jmagly/ai-writing-guide
44
1

Detect and use llms.txt files for LLM-optimized documentation. Use when checking if a site has LLM-ready docs before scraping.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name llms-txt-support
description Detect and use llms.txt files for LLM-optimized documentation. Use when checking if a site has LLM-ready docs before scraping.
tools Read, Write, WebFetch

llms.txt Support Skill

Purpose

Single responsibility: Detect, fetch, and utilize llms.txt files that provide LLM-optimized documentation, enabling 10x faster documentation ingestion. (BP-4)

Background

The llms.txt standard (https://llmstxt.org/) provides a convention for websites to expose LLM-friendly documentation. Instead of scraping entire sites, check for llms.txt first.

File hierarchy (check in order):

  1. llms-full.txt - Complete documentation (largest)
  2. llms.txt - Standard documentation
  3. llms-small.txt - Condensed documentation (smallest)

Grounding Checkpoint (Archetype 1 Mitigation)

Before executing, VERIFY:

  • Base URL is accessible
  • Check all three llms.txt variants in order
  • Validate file content is actual documentation (not error page)
  • Confirm file size is reasonable for the documentation scope

DO NOT assume llms.txt exists. Always probe first.

Uncertainty Escalation (Archetype 2 Mitigation)

ASK USER instead of guessing when:

  • Multiple llms.txt variants found - which size to use?
  • llms.txt content appears partial or outdated
  • File returns but content seems like error page
  • Site has llms.txt but content doesn't match expected documentation

NEVER assume llms.txt quality without verification.

Context Scope (Archetype 3 Mitigation)

Context Type Included Excluded
RELEVANT Target base URL, llms.txt content Full site scraping
PERIPHERAL llms.txt spec reference Other sites' llms.txt
DISTRACTOR Previous scraping attempts Unrelated documentation

Workflow Steps

Step 1: Detect llms.txt (Grounding)

# Check for llms.txt variants (in order of preference)
curl -I https://example.com/llms-full.txt
curl -I https://example.com/llms.txt
curl -I https://example.com/llms-small.txt

# Check common alternate locations
curl -I https://example.com/.well-known/llms.txt
curl -I https://docs.example.com/llms.txt

Step 2: Validate Content

# Fetch and inspect first 100 lines
curl -s https://example.com/llms.txt | head -100

# Check file size
curl -sI https://example.com/llms.txt | grep -i content-length

# Verify it's not an error page
curl -s https://example.com/llms.txt | grep -i "not found\|error\|404" && echo "WARNING: May be error page"

Step 3: Choose Variant

Variant Size Use Case
llms-full.txt Large (1MB+) Complete documentation, full API reference
llms.txt Medium Standard use, balanced coverage
llms-small.txt Small (<100KB) Quick reference, limited context windows

Decision tree:

  1. If context window is limited → llms-small.txt
  2. If need complete coverage → llms-full.txt
  3. Default → llms.txt

Step 4: Fetch and Process

# Download llms.txt
curl -o docs/llms.txt https://example.com/llms.txt

# Convert to skill format (if using skill-seekers)
skill-seekers scrape --llms-txt docs/llms.txt --name myskill

# Or process manually
# llms.txt is already LLM-optimized markdown
cp docs/llms.txt output/myskill/references/complete.md

Step 5: Validate Output

# Check content structure
head -50 output/myskill/references/complete.md

# Verify sections
grep "^#" output/myskill/references/complete.md | head -20

# Check for code examples
grep -c '```' output/myskill/references/complete.md

Recovery Protocol (Archetype 4 Mitigation)

On error:

  1. PAUSE - Note which variant failed
  2. DIAGNOSE - Check error type:
    • 404 Not Found → Try next variant or alternate location
    • 403 Forbidden → May need authentication or user-agent
    • Timeout → Retry with longer timeout
    • Invalid content → Fall back to traditional scraping
  3. ADAPT - Try alternate approach
  4. RETRY - Next variant (max 3 attempts per variant)
  5. ESCALATE - Inform user llms.txt unavailable, suggest scraping

Checkpoint Support

State saved to: .aiwg/working/checkpoints/llms-txt-support/

checkpoints/llms-txt-support/
├── detection_results.json    # Which variants found
├── selected_variant.txt      # Which was chosen
└── content_hash.txt          # For cache validation

llms.txt Format Reference

Standard llms.txt structure:

# Project Name

> Brief description of the project

## Overview
[High-level explanation]

## Installation
[Setup instructions]

## Quick Start
[Getting started guide]

## API Reference
[Detailed API documentation]

## Examples
[Code examples]

## FAQ
[Common questions]

Detection Results Output

{
  "base_url": "https://example.com",
  "detected": {
    "llms-full.txt": {
      "found": true,
      "url": "https://example.com/llms-full.txt",
      "size": 1523456,
      "last_modified": "2025-01-15T10:30:00Z"
    },
    "llms.txt": {
      "found": true,
      "url": "https://example.com/llms.txt",
      "size": 245678,
      "last_modified": "2025-01-15T10:30:00Z"
    },
    "llms-small.txt": {
      "found": false
    }
  },
  "recommended": "llms.txt",
  "reason": "Standard size, good for most use cases"
}

Known Sites with llms.txt

Sites known to support llms.txt (verify before use):

  • Anthropic documentation
  • Many modern API documentation sites
  • Framework documentation following the standard

Always verify - this list may be outdated.

Troubleshooting

Issue Diagnosis Solution
No llms.txt found Site doesn't support Fall back to doc-scraper
Content seems wrong Error page or redirect Check actual content, verify URL
File too large llms-full.txt overwhelming Use llms.txt or llms-small.txt
Outdated content llms.txt not maintained Consider scraping + llms.txt merge

Integration with doc-scraper

If llms.txt is incomplete or outdated, combine approaches:

# 1. Fetch llms.txt as base
curl -o base.md https://example.com/llms.txt

# 2. Scrape for additional/updated content
skill-seekers scrape --config config.json --skip-covered-by base.md

# 3. Merge results
# llms.txt provides structure, scraping fills gaps

References