| name | web-fetch-integration |
| description | Integrates WebFetch tool (available in Claude Code) for loading external documentation, API specifications, and reference materials. Use when workflows need external documentation, API specs, reference guides, or external resources to complete tasks. Loads web pages and PDFs from URLs for comprehensive context. |
| allowed-tools | Read, Grep, Glob, Bash, WebFetch |
Web Fetch Integration
Purpose
Integrate Anthropic's Web Fetch Tool to load external documentation, API specifications, and reference materials into workflows. This enables workflows to access current external resources when needed.
When to Use
Use Web Fetch for:
- Loading external API documentation (REST APIs, SDKs, libraries)
- Fetching reference guides (framework docs, language specs)
- Loading API specifications (OpenAPI/Swagger specs)
- Fetching external PDFs (standards, specifications, guides)
- Accessing current documentation (latest versions, updates)
Use Cases by Workflow:
Planning Workflow:
- Load API specifications for integration planning
- Fetch framework documentation for architecture decisions
- Load external service documentation
Build Workflow:
- Load library documentation during implementation
- Fetch SDK examples for integration
- Load framework guides for component building
Review Workflow:
- Fetch security best practices from external sources
- Load coding standards from team docs
- Access external code quality guidelines
Debug Workflow:
- Load error documentation from external sources
- Fetch troubleshooting guides
- Access stack trace analysis resources
WebFetch Tool Usage (Claude Code)
Availability: ✅ WebFetch is available in Claude Code (built-in tool)
CRITICAL: How Claude Code WebFetch Works
Claude Code's WebFetch is different from the API version:
API WebFetch (beta, not in Claude Code):
- Returns full raw content (HTML/PDF)
- Single URL parameter
Claude Code WebFetch (built-in):
- Requires BOTH
urlandpromptparameters - Returns summarized answer, NOT raw content
- Pipeline: Fetch → Convert to Markdown (max 100KB) → Haiku 3.5 summary → Answer
Usage Pattern:
# WRONG (assumes raw content):
WebFetch(url="https://api.example.com/openapi.json")
→ Returns: Full JSON spec (THIS DOESN'T WORK IN CLAUDE CODE)
# CORRECT (prompt-based):
WebFetch(
url="https://api.example.com/openapi.json",
prompt="What are all the API endpoint paths, HTTP methods, and required parameters?"
)
→ Returns: "The API has the following endpoints: POST /auth/login requires username and password, GET /users/{id} returns user details..."
Key Constraints:
- ❌ Cannot fetch raw API specs for parsing
- ❌ Cannot load full documentation pages
- ❌ Cannot cache raw HTML/JSON content
- ✅ Can ask specific questions and get targeted answers
- ✅ Built-in 15-minute caching (server-side, automatic)
- ✅ Domain validation (blocks malicious domains)
Best Practice: Use multiple targeted questions instead of expecting full content:
- Question 1: "What are all the API endpoints?"
- Question 2: "What authentication method is required?"
- Question 3: "What are the data models and their fields?"
Integration Points
Planning Workflow
Phase 1 - Requirements Intake:
- If external APIs mentioned, ask targeted questions:
- "What are all the API endpoints and HTTP methods?"
- "What authentication method is required?"
- "What are the main data models and their fields?"
- If frameworks mentioned, ask:
- "How do I set up and configure this framework?"
- "What are the core concepts and patterns?"
- "What are the integration requirements?"
- If external services mentioned, ask:
- "What are the service capabilities and limitations?"
- "How do I integrate with this service?"
- "What are the pricing and rate limits?"
Phase 2 - Architecture Design:
- Ask about integration patterns: "What are best practices for integrating [service] with [framework]?"
- Ask about architecture: "What architectural patterns are recommended for [use case]?"
- Ask about best practices: "What are the recommended approaches for [scenario]?"
Build Workflow
Before Component Building:
- If new library used, ask:
- "How do I install and initialize this library?"
- "What are the most common usage patterns and examples?"
- "How do I handle errors and edge cases?"
- If SDK integration needed, ask:
- "How do I set up the SDK?"
- "What are the initialization patterns?"
- "How do I make API calls with the SDK?"
- If framework guides needed, ask:
- "How do I implement [feature] in this framework?"
- "What are the recommended patterns for [task]?"
During Integration:
- For external APIs, ask:
- "How do I authenticate with this API?"
- "What's the request/response format?"
- "How do I handle errors from this API?"
- For integration guides, ask:
- "How do I integrate [service] with [component]?"
- "What are common integration pitfalls?"
- For troubleshooting, ask:
- "What are common errors and how do I fix them?"
- "How do I debug integration issues?"
Review Workflow
Security Review:
- Ask: "What are the OWASP Top 10 security risks and how do I check for them?"
- Ask: "What security vulnerabilities should I look for in [code type]?"
- Ask: "What are best practices for [security concern]?"
Performance Review:
- Ask: "What performance optimization techniques apply to [scenario]?"
- Ask: "How do I profile and identify bottlenecks in [technology]?"
- Ask: "What are performance best practices for [component]?"
Debug Workflow
During Investigation:
- Ask: "What does this error [error code/message] mean and how do I fix it?"
- Ask: "How do I troubleshoot [symptom] in [technology]?"
- Ask: "What are common causes of [issue type] and their solutions?"
Web Fetch Patterns (Claude Code)
Pattern 1: API Specification Loading (Question-Based)
# CORRECT: Ask targeted questions about API
# Planning workflow - gather API information
# Question 1: Endpoints
WebFetch(
url="https://api.example.com/openapi.json",
prompt="What are all the API endpoint paths, HTTP methods, and their purposes?"
)
# Question 2: Authentication
WebFetch(
url="https://api.example.com/openapi.json",
prompt="What authentication method is required? How do I get an access token?"
)
# Question 3: Data Models
WebFetch(
url="https://api.example.com/openapi.json",
prompt="What are the main data models, their fields, types, and relationships?"
)
# Question 4: Error Handling
WebFetch(
url="https://api.example.com/docs",
prompt="What error codes does this API return and what do they mean?"
)
Pattern 2: Library Documentation Loading
# CORRECT: Ask specific implementation questions
# Build workflow - get implementation guidance
# Question 1: Getting Started
WebFetch(
url="https://docs.library.com/getting-started",
prompt="How do I install and initialize this library? What's the basic setup?"
)
# Question 2: Common Patterns
WebFetch(
url="https://docs.library.com/patterns",
prompt="What are the most common usage patterns and code examples?"
)
# Question 3: Integration
WebFetch(
url="https://docs.library.com/integration",
prompt="How do I integrate this library with [framework]? What's the recommended approach?"
)
Pattern 3: Standards and Guidelines
# CORRECT: Ask about specific guidelines
# Review workflow - check against standards
WebFetch(
url="https://owasp.org/www-project-top-ten/",
prompt="What are the OWASP Top 10 security risks and how do I check for them in code?"
)
WebFetch(
url="https://example.com/coding-standards",
prompt="What are the coding standards for error handling, naming conventions, and code structure?"
)
Smart Caching Strategy (Q&A Pair Caching)
CRITICAL: Claude Code WebFetch returns answers, not raw content. Cache Q&A pairs!
Cache Structure
.claude/memory/web_cache/
├── qa_pairs/
│ ├── {url_hash}_{prompt_hash}.json
│ └── ...
└── cache_index.json (Maps {url, prompt} → cached answer)
Cache Entry Format
{
"https://api.example.com/openapi.json": {
"What are all the API endpoints?": {
"answer": "The API has POST /auth/login, GET /users/{id}...",
"fetched_at": "2025-10-29T10:00:00Z",
"ttl_hours": 24,
"expires_at": "2025-10-30T10:00:00Z",
"workflow_used": ["planning"],
"cache_file": ".claude/memory/web_cache/qa_pairs/abc123_def456.json"
},
"What authentication is required?": {
"answer": "OAuth2 Bearer token...",
"fetched_at": "2025-10-29T10:05:00Z",
"ttl_hours": 24,
"expires_at": "2025-10-30T10:05:00Z",
"workflow_used": ["planning"]
}
}
}
Cache Index Format (Simplified)
{
"url_prompt_hash": {
"url": "https://api.example.com/openapi.json",
"prompt": "What are all the API endpoints?",
"answer_file": ".claude/memory/web_cache/qa_pairs/abc123_def456.json",
"ttl_hours": 24,
"last_fetched": "2025-10-29T10:00:00Z",
"expires_at": "2025-10-30T10:00:00Z",
"fetch_count": 2,
"workflow_used": ["planning", "build"]
}
}
TTL Strategy (Hours-Based)
- API Specifications: 24 hours (APIs change frequently, answers may become outdated)
- Library Documentation: 48 hours (version updates, usage patterns change)
- Framework Docs: 48 hours (examples and best practices evolve)
- Standards/Guidelines: 72 hours (even stable content has updates)
Rationale: Since we cache answers (not raw content), shorter TTLs ensure answers stay current. Answers summarize content at fetch time, so re-fetching with same prompt may yield updated information.
Smart Fetching Rules (Q&A Based)
Cache-First Strategy:
- Hash Query: Create hash from
{url}_{prompt}combination - Check Cache Index: Lookup hash in
cache_index.json - Validate TTL: If cached and TTL valid → return cached answer (no fetch)
- Expired Cache: If cached but expired → re-fetch with same prompt
- Not Cached: If not cached → fetch with prompt and cache answer
- Update Index: Always update cache_index.json with new Q&A pair
Deduplication:
- Track
{url, prompt}combinations fetched in current workflow - Same query in same workflow = use cached answer (already fetched)
- Different prompts on same URL = different cache entries (ask different questions)
Graceful Fallback:
- If fetch fails → use cached answer (even if expired)
- Log fetch failure for debugging
- Notify user if using stale cache
Prompt Consistency:
- Use same prompt wording for same URL to maximize cache hits
- Standardize common questions (e.g., "What are the API endpoints?")
Cache Corruption Handling:
- Detection: Validate
cache_index.jsonis valid JSON before use - Recovery: If corrupted, attempt to rebuild from cache files
- Fallback: If rebuild fails, clear cache index and re-fetch
- Validation: Verify cached files exist before using (files may be deleted but index not updated)
- Error Handling: Log corruption events to
.claude/memory/web_cache/corruption.logfor debugging
Cache Cleanup
Automatic Cleanup (run weekly):
- Delete cached files if TTL expired >30 days
- Remove unused cache entries (>90 days unused)
- Keep cache_index.json clean
Best Practices
- Cache Aggressively: Cache all external docs with appropriate TTL
- Check Cache First: Always check cache_index.json before fetching
- Respect TTLs: Re-fetch when expired, not before
- Deduplicate: Prevent duplicate fetches in same workflow
- Fail Gracefully: Use stale cache if fetch fails
- Extract Relevant Sections: Only extract needed parts of fetched content
- Cite Sources: Always cite external sources in workflow output
- Track Usage: Log which workflows use which cached docs
Error Handling
If Web Fetch Fails:
try:
docs = web_fetch(url=url)
except FetchError:
# Fallback: Use local documentation
docs = read_local_docs(url)
log("Used local docs, fetch failed")
If URL Invalid:
if not is_valid_url(url):
ask_user("Please provide valid URL for documentation")
return None
Cache Implementation
Cache Check Function (Q&A Based)
#!/bin/bash
# Check cache before fetching URL with prompt
# Returns cached answer if available
check_cache() {
local url="$1"
local prompt="$2"
local cache_index=".claude/memory/web_cache/cache_index.json"
# Create hash from URL + prompt combination
local query_hash=$(echo -n "${url}:${prompt}" | sha256sum | cut -d' ' -f1)
# Validate cache_index.json exists and is valid JSON
if [ ! -f "$cache_index" ]; then
echo "CACHE_MISS"
return 2
fi
# Validate JSON syntax (cache corruption detection)
if ! jq empty "$cache_index" 2>/dev/null; then
# Cache index corrupted - rebuild or clear
echo "CACHE_CORRUPTED: Clearing cache_index.json" >&2
rm -f "$cache_index"
echo "CACHE_MISS"
return 2
fi
# Check if query_hash exists in cache
local cached_entry=$(jq -r ".[\"$query_hash\"] // empty" "$cache_index" 2>/dev/null)
if [ -n "$cached_entry" ] && [ "$cached_entry" != "null" ]; then
local expires=$(echo "$cached_entry" | jq -r '.expires_at // empty' 2>/dev/null)
local answer_file=$(echo "$cached_entry" | jq -r '.answer_file // empty' 2>/dev/null)
# Validate answer file exists
if [ -z "$answer_file" ] || [ ! -f "$answer_file" ]; then
# Answer file missing - remove from index
jq "del(.[\"$query_hash\"])" "$cache_index" > "$cache_index.tmp" && mv "$cache_index.tmp" "$cache_index"
echo "CACHE_MISS"
return 2
fi
# Check if TTL valid
local expires_ts=$(date -d "$expires" +%s 2>/dev/null)
local now_ts=$(date +%s)
if [ -n "$expires_ts" ] && [ "$expires_ts" -gt "$now_ts" ]; then
# Return cached answer
local answer=$(cat "$answer_file")
echo "CACHE_HIT:$answer"
return 0
else
echo "CACHE_EXPIRED:$answer_file"
return 1
fi
fi
echo "CACHE_MISS"
return 2
}
Cache Save Function (Q&A Based)
#!/bin/bash
# Save fetched answer to cache (Q&A pair)
save_to_cache() {
local url="$1"
local prompt="$2"
local answer="$3"
local ttl_hours="${4:-24}" # Default 24 hours
local cache_dir=".claude/memory/web_cache"
local cache_index="$cache_dir/cache_index.json"
mkdir -p "$cache_dir/qa_pairs"
# Create hash from URL + prompt
local query_hash=$(echo -n "${url}:${prompt}" | sha256sum | cut -d' ' -f1)
local answer_file="$cache_dir/qa_pairs/${query_hash}.json"
# Save answer to file
echo "$answer" > "$answer_file"
# Update cache index
local expires_at=$(date -d "+$ttl_hours hours" -Iseconds)
jq --arg hash "$query_hash" \
--arg url "$url" \
--arg prompt "$prompt" \
--arg file "$answer_file" \
--arg ttl "$ttl_hours" \
--arg expires "$expires_at" \
'.[$hash] = {
"url": $url,
"prompt": $prompt,
"answer_file": $file,
"ttl_hours": ($ttl | tonumber),
"last_fetched": (now | strftime("%Y-%m-%dT%H:%M:%SZ")),
"expires_at": $expires,
"fetch_count": ((.[$hash].fetch_count // 0) + 1),
"workflow_used": ((.[$hash].workflow_used // []) + [$ENV.WORKFLOW_NAME])
}' "$cache_index" > "$cache_index.tmp" 2>/dev/null || echo "{}" > "$cache_index.tmp"
mv "$cache_index.tmp" "$cache_index"
}
Verification Checklist (Q&A Based)
Before Fetching:
- Create hash from
{url}_{prompt}combination - Check
cache_index.jsonfor query hash - If cached and TTL valid → use cached answer (skip fetch)
- If cached but expired → mark for re-fetch with same prompt
- If not cached → proceed with fetch using prompt
- Verify URL is valid and accessible
- Verify prompt is specific and clear
After Fetching:
- Save answer (not raw content) to cache with appropriate TTL
- Update cache_index.json with Q&A pair entry
- Validate answer is relevant to prompt
- Use consistent prompt wording for same questions
- Cite source URL in workflow output
- Track which workflow used this cache entry
On Fetch Failure:
- Check for stale cached answer (even if expired)
- Use stale answer if available
- Log failure for debugging
- Notify user if using stale cache
- Consider alternative phrasing of prompt
Examples
Example 1: Planning with External API (Question-Based)
# Planning workflow - Phase 1
user_request = "Integrate with Stripe API"
# Question 1: Get all endpoints
endpoints = WebFetch(
url="https://stripe.com/docs/api",
prompt="What are all the API endpoint paths, HTTP methods, and their purposes?"
)
# Question 2: Get authentication details
auth_info = WebFetch(
url="https://stripe.com/docs/api/authentication",
prompt="How do I authenticate requests? What API keys are needed and where do I get them?"
)
# Question 3: Get payment models
payment_models = WebFetch(
url="https://stripe.com/docs/api/payment_intents",
prompt="What are the PaymentIntent data model fields and their types? What are required vs optional fields?"
)
# Use answers in architecture design
design_api_integration(endpoints, auth_info, payment_models)
Example 2: Build with Library Docs (Question-Based)
# Build workflow - Before component
library = "react-query"
# Question 1: Setup
setup_info = WebFetch(
url="https://tanstack.com/query/latest/docs/react/overview",
prompt="How do I install and set up react-query? What's the basic configuration?"
)
# Question 2: Common patterns
usage_patterns = WebFetch(
url="https://tanstack.com/query/latest/docs/react/guides/queries",
prompt="What are the most common query patterns and code examples for fetching data?"
)
# Question 3: Mutations
mutations_info = WebFetch(
url="https://tanstack.com/query/latest/docs/react/guides/mutations",
prompt="How do I handle mutations? What's the pattern for POST/PUT/DELETE operations?"
)
# Use in implementation
implement_with_guidance(setup_info, usage_patterns, mutations_info)
Example 3: Review with External Standards (Question-Based)
# Review workflow - Security review
security_guidelines = WebFetch(
url="https://owasp.org/www-project-top-ten/",
prompt="What are the OWASP Top 10 security risks for 2021? How do I check for injection attacks, broken authentication, and sensitive data exposure in code?"
)
# Use in security analysis
analyze_with_owasp(security_guidelines)
Integration with cc10x
Orchestrator Integration:
- Check if external resources needed
- Fetch before workflow starts if needed
- Cache fetched resources for workflow duration
Workflow Integration:
- Add web fetch step before phases that need external docs
- Store fetched content for reference
- Include fetched sources in Verification Summary
References
- Anthropic Web Fetch Tool: https://docs.claude.com/en/docs/agents-and-tools/tool-use/web-fetch-tool
- cc10x Workflows:
plugins/cc10x/skills/cc10x-orchestrator/workflows/