| name | scrapeninja |
| description | High-performance web scraping API with Chrome TLS fingerprint and JS rendering |
| vm0_secrets | SCRAPENINJA_API_KEY |
ScrapeNinja
High-performance web scraping API with Chrome TLS fingerprint, rotating proxies, smart retries, and optional JavaScript rendering.
Official docs: https://scrapeninja.net/docs/
When to Use
Use this skill when you need to:
- Scrape websites with anti-bot protection (Cloudflare, Datadome)
- Extract data without running a full browser (fast
/scrapeendpoint) - Render JavaScript-heavy pages (
/scrape-jsendpoint) - Use rotating proxies with geo selection (US, EU, Brazil, etc.)
- Extract structured data with Cheerio extractors
- Intercept AJAX requests
- Take screenshots of pages
Prerequisites
- Get an API key from RapidAPI or APIRoad:
- RapidAPI: https://rapidapi.com/restyler/api/scrapeninja
- APIRoad: https://apiroad.net/marketplace/apis/scrapeninja
Set environment variable:
# For RapidAPI
export SCRAPENINJA_API_KEY="your-rapidapi-key"
# For APIRoad (use X-Apiroad-Key header instead)
export SCRAPENINJA_API_KEY="your-apiroad-key"
Important: When using
$VARin a command that pipes to another command, wrap the command containing$VARinbash -c '...'. Due to a Claude Code bug, environment variables are silently cleared when pipes are used directly.bash -c 'curl -s "https://api.example.com" -H "Authorization: Bearer $API_KEY"' | jq .
How to Use
1. Basic Scrape (Non-JS, Fast)
High-performance scraping with Chrome TLS fingerprint, no JavaScript:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com"
}
Then run:
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, url: .info.finalUrl, bodyLength: (.body | length)}'
With custom headers and retries:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"headers": ["Accept-Language: en-US"],
"retryNum": 3,
"timeout": 15
}
Then run:
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .
2. Scrape with JavaScript Rendering
For JavaScript-heavy sites (React, Vue, etc.):
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"waitForSelector": "h1",
"timeout": 20
}
Then run:
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '{status: .info.statusCode, bodyLength: (.body | length)}'
With screenshot:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"screenshot": true
}
Then run:
# Get screenshot URL from response
SCREENSHOT_URL=$(bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq -r '.info.screenshot')
# Download the screenshot
curl -s -o screenshot.png "${SCREENSHOT_URL}"
3. Geo-Based Proxy Selection
Use proxies from specific regions:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"geo": "eu"
}
Then run:
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .info
Available geos: us, eu, br (Brazil), fr (France), de (Germany), 4g-eu
4. Smart Retries
Retry on specific HTTP status codes or text patterns:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"retryNum": 3,
"statusNotExpected": [403, 429, 503],
"textNotExpected": ["captcha", "Access Denied"]
}
Then run:
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .
5. Extract Data with Cheerio
Extract structured JSON using Cheerio extractor functions:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://news.ycombinator.com",
"extractor": "function(input, cheerio) { let $ = cheerio.load(input); return $(\".titleline > a\").slice(0,5).map((i,el) => ({title: $(el).text(), url: $(el).attr(\"href\")})).get(); }"
}
Then run:
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.extractor'
6. Intercept AJAX Requests
Capture XHR/fetch responses:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"catchAjaxHeadersUrlMask": "api/data"
}
Then run:
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq '.info.catchedAjax'
7. Block Resources for Speed
Speed up JS rendering by blocking images and media:
Write to /tmp/scrapeninja_request.json:
{
"url": "https://example.com",
"blockImages": true,
"blockMedia": true
}
Then run:
bash -c 'curl -s -X POST "https://scrapeninja.p.rapidapi.com/scrape-js" --header "Content-Type: application/json" --header "X-RapidAPI-Key: ${SCRAPENINJA_API_KEY}" -d @/tmp/scrapeninja_request.json' | jq .
API Endpoints
| Endpoint | Description |
|---|---|
/scrape |
Fast non-JS scraping with Chrome TLS fingerprint |
/scrape-js |
Full Chrome browser with JS rendering |
/v2/scrape-js |
Enhanced JS rendering for protected sites (APIRoad only) |
Request Parameters
Common Parameters (all endpoints)
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | URL to scrape |
headers |
string[] | - | Custom HTTP headers |
retryNum |
int | 1 | Number of retry attempts |
geo |
string | us |
Proxy geo: us, eu, br, fr, de, 4g-eu |
proxy |
string | - | Custom proxy URL (overrides geo) |
timeout |
int | 10/16 | Timeout per attempt in seconds |
textNotExpected |
string[] | - | Text patterns that trigger retry |
statusNotExpected |
int[] | [403, 502] | HTTP status codes that trigger retry |
extractor |
string | - | Cheerio extractor function |
JS Rendering Parameters (/scrape-js, /v2/scrape-js)
| Parameter | Type | Default | Description |
|---|---|---|---|
waitForSelector |
string | - | CSS selector to wait for |
postWaitTime |
int | - | Extra wait time after load (1-12s) |
screenshot |
bool | true | Take page screenshot |
blockImages |
bool | false | Block image loading |
blockMedia |
bool | false | Block CSS/fonts loading |
catchAjaxHeadersUrlMask |
string | - | URL pattern to intercept AJAX |
viewport |
object | 1920x1080 | Custom viewport size |
Response Format
{
"info": {
"statusCode": 200,
"finalUrl": "https://example.com",
"headers": ["content-type: text/html"],
"screenshot": "base64-encoded-png",
"catchedAjax": {
"url": "https://example.com/api/data",
"method": "GET",
"body": "...",
"status": 200
}
},
"body": "<html>...</html>",
"extractor": { "extracted": "data" }
}
Guidelines
- Start with
/scrape: Use the fast non-JS endpoint first, only switch to/scrape-jsif needed - Retries: Set
retryNumto 2-3 for unreliable sites - Geo Selection: Use
eufor European sites,usfor American sites - Extractors: Test extractors at https://scrapeninja.net/cheerio-sandbox/
- Blocked Sites: For Cloudflare/Datadome protected sites, use
/v2/scrape-jsvia APIRoad - Screenshots: Set
screenshot: falseto speed up JS rendering - Rate Limits: Check your plan limits on RapidAPI/APIRoad dashboard
Tools
- Playground: https://scrapeninja.net/scraper-sandbox
- Cheerio Sandbox: https://scrapeninja.net/cheerio-sandbox
- cURL Converter: https://scrapeninja.net/curl-to-scraper