name	web-research
description	Fetch and analyze web page content from URLs. Use when asked to read, summarize, or analyze web links, check websites, or research online content. Converts web pages to readable Markdown format.

Web Research

This skill enables fetching and analyzing web page content from URLs provided in emails or conversations.

When to Use

Use this skill when:

User provides a URL and asks you to read, summarize, or analyze it
Email contains links and user asks "what does this say?" or "summarize this"
User asks you to "check this website" or "look at this page"
User wants information from a specific web page

How to Fetch Web Content

python -m src.web_fetcher <url>

This will:

Download the web page content
Convert HTML to clean Markdown
Add a summary header describing the content
Output to stdout (you can read it directly)
Automatically clean up any temporary files

Example Usage

# Fetch and read a web page
python -m src.web_fetcher https://example.com/article

# Skip the summary header (get raw content only)
python -m src.web_fetcher https://example.com/article --no-summary

Output Format

The fetched content includes:

---
source: https://example.com/article
converted: auto-generated markdown from HTML via MarkItDown
note: JavaScript-rendered content not included. Some dynamic content may be missing.
---

> **What's in this page:** Article from example.com: "Article Title", 5 sections, includes tables, 15 links, ~2000 words

---

[Full page content as markdown...]

Important Limitations

JavaScript-Rendered Content

Only fetches static HTML (no JavaScript execution)
Modern single-page apps (SPAs) may show minimal content
Dynamic content loaded via JavaScript will be missing
If content seems incomplete, inform the user of this limitation

Authentication & Paywalls

Cannot access content behind login walls
Cannot bypass paywalls or subscription requirements
If you encounter these, inform the user the content is not accessible

Rate Limiting & Blocking

Some websites block automated requests
If fetch fails, the site may be blocking bots
Inform user and suggest they copy/paste the content if needed

Security Features

The web fetcher includes built-in security:

Only HTTP/HTTPS URLs allowed (no file://, ftp://, etc.)
Blocks internal/private IP addresses (localhost, 192.168.x.x, 10.x.x.x)
10-second timeout to prevent hanging
5MB content size limit
Maximum 5 redirects

Workflow Example

When user forwards an email with a link:

User: "Can you summarize this article? https://techcrunch.com/2024/article"

You:
1. Run: python -m src.web_fetcher https://techcrunch.com/2024/article
2. Read the markdown output
3. Provide summary based on the content
4. If content seems incomplete, note the JavaScript limitation

Error Handling

Common errors and how to handle them:

Error	Meaning	What to Tell User
"URL validation failed: Blocked hostname"	Trying to access localhost/internal IP	Cannot access internal/private addresses for security
"Request timed out"	Site too slow or unreachable	The website didn't respond in time, may be down or slow
"Failed to fetch URL: 403"	Site blocking automated access	The website is blocking automated requests
"Failed to fetch URL: 404"	Page not found	The URL doesn't exist or has been moved
"Content too large"	Page exceeds 5MB	The page is too large to fetch

Best Practices

Always inform the user if content seems incomplete or if JavaScript rendering is likely needed
Provide the source URL in your response so user can verify
Handle errors gracefully - if fetch fails, ask user if they can copy/paste the content
Don't retry failed fetches without user confirmation
Be transparent about limitations (no auth, no JS, etc.)

Notes

Temporary files are automatically cleaned up (no manual deletion needed)
The tool uses the same MarkItDown library as document processing
Content is fetched fresh each time (no caching)
User-Agent header is set to avoid basic bot detection

web-research

Install Skill

SKILL.md