| name | apify-actor |
| description | Build and deploy Apify actors for web scraping and automation. Use for serverless scraping, data extraction, browser automation, and API integrations with Python. |
Apify Actor Development
Build serverless Apify actors for web scraping, browser automation, and data extraction using Python.
Prerequisites & Setup (MANDATORY)
Before creating or modifying actors, verify that apify CLI is installed:
Run apify --help.
If it is not installed, you can run:
curl -fsSL https://apify.com/install-cli.sh | bash
# Or (Mac): brew install apify-cli
# Or (Windows): irm https://apify.com/install-cli.ps1 | iex
# Or: npm install -g apify-cli
When the apify CLI is installed, check that it is logged in with:
apify info # Should return your username
If it is not logged in, check if the APIFY_TOKEN environment variable is defined (if not, ask the user to generate one on https://console.apify.com/settings/integrations and then define APIFY_TOKEN with it).
Then run:
apify login -t $APIFY_TOKEN
Quick Start Workflow
Creating a New Actor
- Copy template - Copy all files including hidden ones from the skill's
assets/python-template/directory to your new actor directory. The template is located at{base_dir}/assets/python-template/where{base_dir}is the skill's base directory. - Setup pre-commit - Run
uv run pre-commit installfor automatic quality checks - Add dependencies - Use
uv add package-namefor each required dependency - Implement logic - Write the actor code in
src/main.py(thesrc/__main__.pyentry point is already set up) - Configure schemas - Update input/output schemas in
.actor/input_schema.jsonand.actor/output_schema.json - Configure platform settings - Update
.actor/actor.jsonwith actor metadata - Write documentation - Create comprehensive
.actor/ACTOR.mdfor the marketplace - Test locally - Run
apify runto verify functionality - Deploy - Run
apify pushto deploy the actor on the Apify platform
CRITICAL REMINDERS:
- NEVER create
requirements.txt - NEVER use
pip installoruv pip install - ALWAYS use
uv addto add dependencies - ALWAYS use
uv syncto install dependencies - ALWAYS format with
uv run ruff format .after file changes - ALWAYS lint with
uv run ruff check --fix .after file changes - ALWAYS check the
apify pushoutput for build errors before considering deployment complete - Input/output schemas should be updated when changing actor functionality
Core Concepts
Input/Output Pattern
Every actor follows this pattern:
- Input: JSON from key-value store (defined by input schema)
- Process: Actor logic extracts/transforms data
- Output: Results pushed to dataset or key-value store
Storage Types
- Dataset: Structured data (arrays of objects) - use for scraping results and tabular data
- Key-Value Store: Arbitrary data (files, objects) - use for screenshots, PDFs, state, and binary files
- Request Queue: URLs to crawl - use for deep web crawling and multi-page scraping workflows
Project Structure
my-actor/
├── .actor/
│ ├── actor.json # Actor metadata
│ ├── input_schema.json # Input schema
│ ├── output_schema.json # Output schema
│ ├── ACTOR.md # PUBLIC marketplace documentation (CRITICAL)
│ └── datasets/
│ └── dataset_schema.json # Dataset schema with views
├── src/ or package_name/ # Source code
│ ├── __init__.py
│ ├── __main__.py # Entry point for CLI (REQUIRED)
│ └── main.py # Main actor logic
├── tests/ # Test files
│ └── test_*.py
├── .dockerignore # Docker build exclusions
├── .pre-commit-config.yaml # Pre-commit hooks
├── Dockerfile # Container config
├── pyproject.toml # Python project config
├── uv.lock # Dependency lock file
└── README.md # Development docs
Common Patterns
See references/python-sdk.md for complete examples of:
- Simple HTTP scraping with BeautifulSoup
- Browser automation with Playwright and Selenium
- Deep crawling with Request Queue
- Proxy management and error handling
- Storage APIs (Dataset, Key-Value Store, Request Queue)
Input Schema Design
Input schemas use JSON Schema format to define and validate actor inputs. See references/input-schema.md for:
- Field types (string, number, boolean, array, object)
- Special editors (requestListSources, globs, pseudoUrls, proxy, json, textarea)
- Validation patterns (regex, length, range, required fields)
- Complete examples with best practices
Key principles:
- Always include descriptions and examples
- Provide examples for all fields
- Set sensible defaults for ease of use
- Use appropriate editors for better UX
- Add units for numeric fields (pages, seconds, MB)
Output Schema Design
Output schemas define where actors store outputs and provide templates for accessing that data. See references/output-schema.md for:
- Schema structure and template variables (links.apiDefaultDatasetUrl, links.apiDefaultKeyValueStoreUrl, etc.)
- Dataset and key-value store output configurations
- Multiple output types in a single actor
- Integration with Python code
- Complete examples with emojis and descriptions
Key principles:
- Define all outputs explicitly (even if empty)
- Use descriptive titles with emojis for visual clarity
- Include helpful descriptions for users and LLM integrations
- Match templates to actual storage locations in code
ACTOR.md Documentation (CRITICAL)
The .actor/ACTOR.md file is the public-facing documentation that users see in the Apify marketplace. This is your actor's main sales page and user guide.
Required sections:
- Title & Description - Clear, compelling one-liner
- What it does - Bullet points of key capabilities
- Input - Example JSON with field explanations
- Output - Example JSON showing expected results
- Use Cases - Who benefits and why (with emojis)
- Standby Mode (if applicable) - API usage examples
- Tips & Best Practices - Performance and configuration guidance
See assets/python-template/.actor/ACTOR.md for a complete template.
Key principles:
- Write for non-technical users - assume no coding knowledge
- Use emojis to make sections scannable (🎯 🔍 ⚡ 🚀)
- Provide copy-paste ready code examples
- Show actual input/output samples, not schemas
- Highlight benefits and use cases clearly
Modifying Existing Actors
When modifying an existing actor:
- Understand current logic - Read
src/main.py - Check input schema - Review
.actor/input_schema.jsonfor expected inputs - Add dependencies with uv - Use
uv add package-name(NEVER pip install) - Make code changes - Implement the requested features
- Format code - Run
uv run ruff format .(MANDATORY) - Lint code - Run
uv run ruff check --fix .(MANDATORY) - Test changes locally - Use
apify runbefore deploying - Update schema if needed - Add new fields to input schema
- Deploy - Push changes with
apify push
Debugging Actors
- Test locally - Use
apify runto test actor locally before deployment - Check storage - Inspect
./storage/directory for datasets, key-value stores, and request queues - Add logging - Use
Actor.log.info(),Actor.log.debug(),Actor.log.error()(see SDK references) - View logs on platform - Check actor run logs in Apify Console for production issues
Best Practices
Code Quality
- Validate input - Always check required fields and formats with clear error messages
- Handle errors - Use try/catch with proper error logging and graceful degradation
- Structured logging - Use Actor.log with extra fields for better debugging
- Type hints - Add type annotations for better code clarity and IDE support
- Docstrings - Document functions and modules for maintainability
- Format with ruff - ALWAYS run
uv run ruff format .before committing - Lint with ruff - ALWAYS run
uv run ruff check --fix .before deploying
Performance & Scalability
- Batch processing - Push data in batches (100-1000 items) for large datasets to reduce API calls
- Use proxies - Avoid IP blocking for web scraping with proxy configuration
- Resource limits - Set appropriate memory limits and timeouts in
.actor/actor.json - Optimize Docker - Use multi-stage builds, bytecode compilation, and minimal base images
- Consider Standby mode - For low-latency (<100ms), high-frequency use cases
Security & Configuration
- Environment variables - Never hardcode secrets; use
Actor.configand environment variables - Input validation - Use JSON Schema patterns, required fields, and runtime validation
- Run as non-root - Use
myuserin Dockerfile for container security - Minimize image size - Use
.dockerignoreto exclude unnecessary files and reduce build time
Development Workflow
- Testing - Write tests with pytest; use coverage and snapshot testing for reliability
- Pre-commit hooks - Use ruff and pre-commit for consistent code quality (MANDATORY)
- Use uv exclusively - NEVER use pip or requirements.txt; only use
uv addanduv sync(MANDATORY) - Lock dependencies - Always commit
uv.lockfor reproducible builds (MANDATORY) - Test locally - Always test with
apify runbefore deploying to catch issues early - Dataset schemas - Define
dataset_schema.jsonwith views for better Apify Console UI - CLI support - Add CLI entry points via
__main__.pyfor local testing and development
Standby Mode (Real-time API)
Standby mode allows actors to run as persistent HTTP servers, providing instant responses without cold start delays.
Perfect for:
- Real-time APIs requiring <100ms response times
- Webhook endpoints that need immediate processing
- High-frequency requests (multiple requests per second)
- Integration with real-time services (Slack bots, chat applications, webhooks)
- Low-latency scraping APIs and on-demand data extraction
See references/standby-mode.md for complete implementation patterns, authentication, and examples.
References
Detailed documentation in references/:
python-sdk.md- SDK patterns and complete code examplesstandby-mode.md- Real-time API implementationinput-schema.md- Input validation and UI configurationoutput-schema.md- Output configuration and templates
Troubleshooting
If you need information not covered in this skill, use the WebFetch tool with https://docs.apify.com/llms.txt to access the complete official documentation.