name	article-extraction
description	Extract clean article content from web pages, removing ads and clutter for reading and archiving

Article Extraction Skill

Name: article-extraction
Author: ljchg12-hue

Extract clean article text from web pages, removing ads, navigation, and clutter.

When to Use

Content archiving
Research collection
Reading list management
Content analysis

Core Capabilities

Main content extraction
Metadata extraction (title, author, date)
Image extraction
Clean HTML/Markdown output
Multi-page article handling
Paywall bypass (where legal)

Tools

# Readability (Node.js)
npm install @mozilla/readability

# newspaper3k (Python)
pip install newspaper3k
python -c "from newspaper import Article; a = Article('URL'); a.download(); a.parse(); print(a.text)"

# Trafilatura (Python)
pip install trafilatura
trafilatura -u "URL"

Best Practices

Respect robots.txt
Cache extracted content
Preserve attribution
Handle different CMS formats

Resources

Readability: https://github.com/mozilla/readability
newspaper3k: https://github.com/codelucas/newspaper

article-extraction

Install Skill

SKILL.md

Article Extraction Skill

When to Use

Core Capabilities

Tools

Best Practices

Resources