| name | firecrawl-scraper |
| description | Complete knowledge domain for Firecrawl v2 API - web scraping and crawling that converts websites into LLM-ready markdown or structured data. Use when: scraping websites, crawling entire sites, extracting web content, converting HTML to markdown, building web scrapers, handling dynamic JavaScript content, bypassing anti-bot protection, extracting structured data from web pages, or when encountering "content not loading", "JavaScript rendering issues", or "blocked by bot detection". Keywords: firecrawl, firecrawl api, web scraping, web crawler, scrape website, crawl website, extract content, html to markdown, site crawler, content extraction, web automation, firecrawl-py, firecrawl-js, llm ready data, structured data extraction, bot bypass, javascript rendering, scraping api, crawling api, map urls, batch scraping |
| license | MIT |
Firecrawl Web Scraper Skill
Status: Production Ready ✅ Last Updated: 2025-10-24 Official Docs: https://docs.firecrawl.dev API Version: v2
What is Firecrawl?
Firecrawl is a Web Data API for AI that turns entire websites into LLM-ready markdown or structured data. It handles:
- JavaScript rendering - Executes client-side JavaScript to capture dynamic content
- Anti-bot bypass - Gets past CAPTCHA and bot detection systems
- Format conversion - Outputs as markdown, JSON, or structured data
- Screenshot capture - Saves visual representations of pages
- Browser automation - Full headless browser capabilities
API Endpoints
1. /v2/scrape - Single Page Scraping
Scrapes a single webpage and returns clean, structured content.
Use Cases:
- Extract article content
- Get product details
- Scrape specific pages
- Convert HTML to markdown
Key Options:
formats: ["markdown", "html", "screenshot"]onlyMainContent: true/false (removes nav, footer, ads)waitFor: milliseconds to wait before scrapingactions: browser automation actions (click, scroll, etc.)
2. /v2/crawl - Full Site Crawling
Crawls all accessible pages from a starting URL.
Use Cases:
- Index entire documentation sites
- Archive website content
- Build knowledge bases
- Scrape multi-page content
Key Options:
limit: max pages to crawlmaxDepth: how many links deep to followallowedDomains: restrict to specific domainsexcludePaths: skip certain URL patterns
3. /v2/map - URL Discovery
Maps all URLs on a website without scraping content.
Use Cases:
- Find sitemap
- Discover all pages
- Plan crawling strategy
- Audit website structure
4. /v2/extract - Structured Data Extraction
Uses AI to extract specific data fields from pages.
Use Cases:
- Extract product prices and names
- Parse contact information
- Build structured datasets
- Custom data schemas
Key Options:
schema: Zod or JSON schema defining desired structuresystemPrompt: guide AI extraction behavior
Authentication
Firecrawl requires an API key for all requests.
Get API Key
- Sign up at https://www.firecrawl.dev
- Go to dashboard → API Keys
- Copy your API key (starts with
fc-)
Store Securely
NEVER hardcode API keys in code!
# .env file
FIRECRAWL_API_KEY=fc-your-api-key-here
# .env.local (for local development)
FIRECRAWL_API_KEY=fc-your-api-key-here
Python SDK Usage
Installation
pip install firecrawl-py
Latest Version: firecrawl-py v4.5.0+
Basic Scrape
import os
from firecrawl import FirecrawlApp
# Initialize client
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
# Scrape a single page
result = app.scrape_url(
url="https://example.com/article",
params={
"formats": ["markdown", "html"],
"onlyMainContent": True
}
)
# Access markdown content
markdown = result.get("markdown")
print(markdown)
Crawl Multiple Pages
import os
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
# Start crawl
crawl_result = app.crawl_url(
url="https://docs.example.com",
params={
"limit": 100,
"scrapeOptions": {
"formats": ["markdown"]
}
},
poll_interval=5 # Check status every 5 seconds
)
# Process results
for page in crawl_result.get("data", []):
url = page.get("url")
markdown = page.get("markdown")
print(f"Scraped: {url}")
Extract Structured Data
import os
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
# Define schema
schema = {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"product_price": {"type": "number"},
"availability": {"type": "string"}
},
"required": ["company_name", "product_price"]
}
# Extract data
result = app.extract(
urls=["https://example.com/product"],
params={
"schema": schema,
"systemPrompt": "Extract product information from the page"
}
)
print(result)
TypeScript/Node.js SDK Usage
Installation
npm install @mendable/firecrawl-js
# or
pnpm add @mendable/firecrawl-js
# or use the unscoped package:
npm install firecrawl
Latest Version: @mendable/firecrawl-js v4.4.1+ (or firecrawl v4.4.1+)
Basic Scrape
import FirecrawlApp from '@mendable/firecrawl-js';
// Initialize client
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// Scrape a single page
const result = await app.scrapeUrl('https://example.com/article', {
formats: ['markdown', 'html'],
onlyMainContent: true
});
// Access markdown content
const markdown = result.markdown;
console.log(markdown);
Crawl Multiple Pages
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// Start crawl
const crawlResult = await app.crawlUrl('https://docs.example.com', {
limit: 100,
scrapeOptions: {
formats: ['markdown']
}
});
// Process results
for (const page of crawlResult.data) {
console.log(`Scraped: ${page.url}`);
console.log(page.markdown);
}
Extract Structured Data with Zod
import FirecrawlApp from '@mendable/firecrawl-js';
import { z } from 'zod';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
// Define schema with Zod
const schema = z.object({
company_name: z.string(),
product_price: z.number(),
availability: z.string()
});
// Extract data
const result = await app.extract({
urls: ['https://example.com/product'],
schema: schema,
systemPrompt: 'Extract product information from the page'
});
console.log(result);
Common Use Cases
1. Documentation Scraping
Scenario: Convert entire documentation site to markdown for RAG/chatbot
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
docs = app.crawl_url(
url="https://docs.myapi.com",
params={
"limit": 500,
"scrapeOptions": {
"formats": ["markdown"],
"onlyMainContent": True
},
"allowedDomains": ["docs.myapi.com"]
}
)
# Save to files
for page in docs.get("data", []):
filename = page["url"].replace("https://", "").replace("/", "_") + ".md"
with open(f"docs/{filename}", "w") as f:
f.write(page["markdown"])
2. Product Data Extraction
Scenario: Extract structured product data for e-commerce
const schema = z.object({
title: z.string(),
price: z.number(),
description: z.string(),
images: z.array(z.string()),
in_stock: z.boolean()
});
const products = await app.extract({
urls: productUrls,
schema: schema,
systemPrompt: 'Extract all product details including price and availability'
});
3. News Article Scraping
Scenario: Extract clean article content without ads/navigation
article = app.scrape_url(
url="https://news.com/article",
params={
"formats": ["markdown"],
"onlyMainContent": True,
"removeBase64Images": True
}
)
# Get clean markdown
content = article.get("markdown")
Error Handling
Python
from firecrawl import FirecrawlApp
from firecrawl.exceptions import FirecrawlException
app = FirecrawlApp(api_key=os.environ.get("FIRECRAWL_API_KEY"))
try:
result = app.scrape_url("https://example.com")
except FirecrawlException as e:
print(f"Firecrawl error: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
TypeScript
import FirecrawlApp from '@mendable/firecrawl-js';
const app = new FirecrawlApp({
apiKey: process.env.FIRECRAWL_API_KEY
});
try {
const result = await app.scrapeUrl('https://example.com');
} catch (error) {
if (error.response) {
// API error
console.error('API Error:', error.response.data);
} else {
// Network or other error
console.error('Error:', error.message);
}
}
Rate Limits & Best Practices
Rate Limits
- Free tier: 500 credits/month
- Paid tiers: Higher limits based on plan
- Credits consumed vary by endpoint and options
Best Practices
- Use
onlyMainContent: trueto reduce credits and get cleaner data - Set reasonable limits on crawls to avoid excessive costs
- Handle retries with exponential backoff for transient errors
- Cache results locally to avoid re-scraping same content
- Use
mapendpoint first to plan crawling strategy - Batch extract calls when processing multiple URLs
- Monitor credit usage in dashboard
Cloudflare Workers Integration
⚠️ Important: SDK Compatibility
The Firecrawl SDK cannot run in Cloudflare Workers due to Node.js dependencies (specifically axios which uses Node.js http module). Workers require Web Standard APIs.
✅ Use the direct REST API with fetch instead (see example below).
Alternative: Self-host with workers-firecrawl - a Workers-native implementation (requires Workers Paid Plan, only implements /search endpoint).
Workers Example: Direct REST API
This example uses the fetch API to call Firecrawl directly - works perfectly in Cloudflare Workers:
interface Env {
FIRECRAWL_API_KEY: string;
SCRAPED_CACHE?: KVNamespace; // Optional: for caching results
}
interface FirecrawlScrapeResponse {
success: boolean;
data: {
markdown?: string;
html?: string;
metadata: {
title?: string;
description?: string;
language?: string;
sourceURL: string;
};
};
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
if (request.method !== 'POST') {
return Response.json({ error: 'Method not allowed' }, { status: 405 });
}
try {
const { url } = await request.json<{ url: string }>();
if (!url) {
return Response.json({ error: 'URL is required' }, { status: 400 });
}
// Check cache (optional)
if (env.SCRAPED_CACHE) {
const cached = await env.SCRAPED_CACHE.get(url, 'json');
if (cached) {
return Response.json({ cached: true, data: cached });
}
}
// Call Firecrawl API directly using fetch
const response = await fetch('https://api.firecrawl.dev/v2/scrape', {
method: 'POST',
headers: {
'Authorization': `Bearer ${env.FIRECRAWL_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: url,
formats: ['markdown'],
onlyMainContent: true,
removeBase64Images: true
})
});
if (!response.ok) {
const errorText = await response.text();
throw new Error(`Firecrawl API error (${response.status}): ${errorText}`);
}
const result = await response.json<FirecrawlScrapeResponse>();
// Cache for 1 hour (optional)
if (env.SCRAPED_CACHE && result.success) {
await env.SCRAPED_CACHE.put(
url,
JSON.stringify(result.data),
{ expirationTtl: 3600 }
);
}
return Response.json({
cached: false,
data: result.data
});
} catch (error) {
console.error('Scraping error:', error);
return Response.json(
{ error: error instanceof Error ? error.message : 'Unknown error' },
{ status: 500 }
);
}
}
};
Environment Setup: Add FIRECRAWL_API_KEY in Wrangler secrets:
npx wrangler secret put FIRECRAWL_API_KEY
Optional KV Binding (for caching - add to wrangler.jsonc):
{
"kv_namespaces": [
{
"binding": "SCRAPED_CACHE",
"id": "your-kv-namespace-id"
}
]
}
See templates/firecrawl-worker-fetch.ts for a complete production-ready example.
When to Use This Skill
✅ Use Firecrawl when:
- Scraping modern websites with JavaScript
- Need clean markdown output for LLMs
- Building RAG systems from web content
- Extracting structured data at scale
- Dealing with bot protection
- Need reliable, production-ready scraping
❌ Don't use Firecrawl when:
- Scraping simple static HTML (use cheerio/beautifulsoup)
- Have existing Puppeteer/Playwright setup working well
- Working with APIs (use direct API calls instead)
- Budget constraints (free tier has limits)
Common Issues & Solutions
Issue: "Invalid API Key"
Cause: API key not set or incorrect Fix:
# Check env variable is set
echo $FIRECRAWL_API_KEY
# Verify key format (should start with fc-)
Issue: "Rate limit exceeded"
Cause: Exceeded monthly credits Fix:
- Check usage in dashboard
- Upgrade plan or wait for reset
- Use
onlyMainContent: trueto reduce credits
Issue: "Timeout error"
Cause: Page takes too long to load Fix:
result = app.scrape_url(url, params={"waitFor": 10000}) # Wait 10s
Issue: "Content is empty"
Cause: Content loaded via JavaScript after initial render Fix:
result = app.scrape_url(url, params={
"waitFor": 5000,
"actions": [{"type": "wait", "milliseconds": 3000}]
})
Advanced Features
Browser Actions
Perform interactions before scraping:
result = app.scrape_url(
url="https://example.com",
params={
"actions": [
{"type": "click", "selector": "button.load-more"},
{"type": "wait", "milliseconds": 2000},
{"type": "scroll", "direction": "down"}
]
}
)
Custom Headers
result = app.scrape_url(
url="https://example.com",
params={
"headers": {
"User-Agent": "Custom Bot 1.0",
"Accept-Language": "en-US"
}
}
)
Webhooks for Long Crawls
Instead of polling, receive results via webhook:
crawl = app.crawl_url(
url="https://docs.example.com",
params={
"limit": 1000,
"webhook": "https://your-domain.com/webhook"
}
)
Package Versions
| Package | Version | Last Checked |
|---|---|---|
| firecrawl-py | 4.5.0+ | 2025-10-20 |
| @mendable/firecrawl-js (or firecrawl) | 4.4.1+ | 2025-10-24 |
| API Version | v2 | Current |
Note: The Node.js SDK requires Node.js >=22.0.0 and cannot run in Cloudflare Workers. Use direct REST API calls in Workers (see Cloudflare Workers Integration section).
Official Documentation
- Docs: https://docs.firecrawl.dev
- Python SDK: https://docs.firecrawl.dev/sdks/python
- Node.js SDK: https://docs.firecrawl.dev/sdks/node
- API Reference: https://docs.firecrawl.dev/api-reference
- GitHub: https://github.com/mendableai/firecrawl
- Dashboard: https://www.firecrawl.dev/app
Next Steps After Using This Skill
- Store scraped data: Use Cloudflare D1, R2, or KV to persist results
- Build RAG system: Combine with Vectorize for semantic search
- Add scheduling: Use Cloudflare Queues for recurring scrapes
- Process content: Use Workers AI to analyze scraped data
Token Savings: ~60% vs manual integration Error Prevention: API authentication, rate limiting, format handling Production Ready: ✅