Claude Code Plugins

Community-maintained marketplace

Feedback

Extracts content from blog posts and news articles. Use when user asks to scrape a URL, extract article content, get text from a webpage, discover articles from a blog, parse RSS feeds, or needs LLM-ready content with token counts. Supports single articles, batch processing, and site-wide discovery.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name web-scraping
description Extracts content from blog posts and news articles. Use when user asks to scrape a URL, extract article content, get text from a webpage, discover articles from a blog, parse RSS feeds, or needs LLM-ready content with token counts. Supports single articles, batch processing, and site-wide discovery.
allowed-tools Bash(npx tsx:*), Bash(node:*), Read, Write, Glob

Web Scraping Skill

Extract blog and news content from any website using the @tyroneross/blog-scraper SDK.

Quick Reference

Single Article (Most Common)

import { extractArticle } from '@tyroneross/blog-scraper';

const article = await extractArticle('https://example.com/blog/post');
// Returns: { title, markdown, text, html, wordCount, readingTime, excerpt, author }

LLM-Ready Output (For AI/RAG)

import { scrapeForLLM } from '@tyroneross/blog-scraper/llm';

const { markdown, tokens, chunks, frontmatter } = await scrapeForLLM(url);
// tokens: estimated count for context window management
// chunks: pre-split for RAG applications

Discover Articles from Site

import { scrapeWebsite } from '@tyroneross/blog-scraper';

const result = await scrapeWebsite('https://techcrunch.com', {
  maxArticles: 10,
  extractFullContent: true
});

Smart Mode (Auto-Detect)

import { smartScrape } from '@tyroneross/blog-scraper';

const result = await smartScrape(url);
if (result.mode === 'article') {
  console.log(result.article.title);
} else {
  console.log(result.articles.length, 'articles found');
}

Batch Processing

import { scrapeUrls } from '@tyroneross/blog-scraper/batch';

const result = await scrapeUrls(urls, { concurrency: 3 });

Validate Before Scraping

import { validateUrl } from '@tyroneross/blog-scraper/validation';

const { isReachable, robotsAllowed, suggestedAction } = await validateUrl(url);

Output Properties

Property Description
title Article title
markdown Formatted Markdown content
text Plain text (no formatting)
html Raw HTML content
excerpt Short summary
author Author name if detected
publishedDate Publication date
wordCount Total words
readingTime Estimated minutes to read

Running Code

Create a script file and run with:

npx tsx script.ts

When to Use Each Function

User Request Function
"Extract this article" extractArticle(url)
"Get content for LLM" scrapeForLLM(url)
"Find articles on this site" scrapeWebsite(url)
"Not sure if article or blog" smartScrape(url)
"Process these 5 URLs" scrapeUrls(urls)
"Can I scrape this?" validateUrl(url)