name	stagehand
description	Stagehand Python AI-powered browser automation. Use for web scraping, form filling, clicking elements, extracting structured data, and autonomous multi-step browser workflows. Covers act(), extract(), observe() methods and Computer Use Agent patterns.
metadata	[object Object]

Stagehand Python Browser Automation Skill

AI-powered browser automation using Stagehand Python with act, extract, observe, and agent methods.

Overview

Stagehand Python provides AI-powered browser automation built on Playwright with:

act(): Perform actions using natural language
extract(): Extract structured data using Pydantic schemas
observe(): Plan actions and get selectors before executing
agent(): Create autonomous agents for complex multi-step workflows

Key Mental Model: Use natural language instructions for browser interactions. Keep actions atomic. Always use Pydantic schemas for data extraction.

Quick Reference

What You Need	Method
Click/type/interact	`page.act("Click the submit button")`
Extract data	`page.extract("Get all product prices", schema=PriceList)`
Plan before acting	`page.observe("Find the login form")`
Complex workflows	`agent.execute("Fill out the form and submit")`

Configuration

from stagehand import Stagehand, StagehandConfig
import asyncio
import os
from dotenv import load_dotenv

load_dotenv()

async def main():
    config = StagehandConfig(
        env="BROWSERBASE",  # or "LOCAL" for local browser
        api_key=os.getenv("BROWSERBASE_API_KEY"),
        project_id=os.getenv("BROWSERBASE_PROJECT_ID"),
        model_name="google/gemini-2.5-flash-preview-05-20",
        model_api_key=os.getenv("MODEL_API_KEY"),
        verbose=1,  # 0=minimal, 1=medium, 2=detailed
        dom_settle_timeout_ms=30000,
        self_heal=True,
    )
    
    # Recommended: Use as async context manager
    async with Stagehand(config) as stagehand:
        page = stagehand.page
        # Your automation code here
        
if __name__ == "__main__":
    asyncio.run(main())

Configuration Options

Option	Description	Default
`env`	`"BROWSERBASE"` or `"LOCAL"`	`"BROWSERBASE"`
`api_key`	Browserbase API key	Required for BROWSERBASE
`project_id`	Browserbase project ID	Required for BROWSERBASE
`model_name`	AI model for instructions	Required
`model_api_key`	API key for the AI model	Required
`verbose`	Logging level (0-2)	1
`dom_settle_timeout_ms`	DOM settle timeout	30000
`self_heal`	Enable self-healing	True

Core Methods

1. act() - Perform Actions

Execute actions using natural language. Keep actions atomic and specific.

# Simple actions
await page.act("Click the sign in button")
await page.act("Type 'hello world' into the search input")
await page.act("Select 'United States' from the country dropdown")

# Form filling with variables
await page.act("Enter name 'John Doe' and email 'john@example.com'")

CRITICAL: Actions should be atomic (single step).

# GOOD: Atomic actions
await page.act("Click the submit button")
await page.act("Type 'password123' into the password field")

# BAD: Multi-step actions (AVOID)
await page.act("Order me a pizza")
await page.act("Sign in and navigate to settings")

2. observe() - Plan Before Acting

Plan actions and get selectors before executing. Results can be passed directly to act().

# Get action plan
results = await page.observe("Click the sign in button")

# Use result directly in act()
await page.act(results[0])

# With visual overlay for debugging
results = await page.observe(
    instruction="Find all navigation links",
    draw_overlay=True
)

Use observe() when:

Page has multiple similar elements
DOM is dynamic/changing
You need to cache selectors for repeated use

3. extract() - Extract Structured Data

Extract data using Pydantic schemas. Always use schemas for structured data.

Simple String Extraction

button_text = await page.extract("Get the sign in button text")

Structured Extraction (Recommended)

from pydantic import BaseModel, Field
from typing import List

class ProductInfo(BaseModel):
    name: str = Field(..., description="Product name")
    price: float = Field(..., description="Product price in USD")
    in_stock: bool = Field(..., description="Whether product is in stock")

product = await page.extract(
    instruction="Extract the main product details",
    schema=ProductInfo
)
print(f"{product.name}: ${product.price}")

Array Extraction

class ProductList(BaseModel):
    products: List[ProductInfo] = Field(..., description="List of products")

data = await page.extract(
    instruction="Extract all products on the page",
    schema=ProductList
)

for product in data.products:
    print(f"{product.name}: ${product.price}")

Complex Nested Extraction

class Address(BaseModel):
    street: str
    city: str
    country: str

class Company(BaseModel):
    name: str = Field(..., description="Company name")
    description: str = Field(..., description="Brief description")
    address: Address = Field(..., description="Company headquarters")

class CompanyList(BaseModel):
    companies: List[Company]

companies = await page.extract(
    "Extract all company information including addresses",
    schema=CompanyList
)

Agent System (Computer Use Agent)

For autonomous multi-step workflows, use the Agent system.

Creating Agents

# Default agent (uses configured model)
agent = stagehand.agent()

# OpenAI Computer Use Agent
agent = stagehand.agent(
    model="computer-use-preview",
    instructions="You are a helpful web navigation assistant.",
    options={"apiKey": os.getenv("OPENAI_API_KEY")}
)

# Anthropic Claude Agent
agent = stagehand.agent(
    model="claude-sonnet-4-20250514",
    instructions="You are a helpful web navigation assistant.",
    options={"apiKey": os.getenv("ANTHROPIC_API_KEY")}
)

Agent Execution

# Simple task
result = await agent.execute("Navigate to the pricing page")

# Complex multi-step task with options
result = await agent.execute(
    instruction="Fill out the contact form with mock data and submit it",
    max_steps=20,
    auto_screenshot=True,
    wait_between_actions=1000  # milliseconds
)

Agent Best Practices

# GOOD: Specific, clear instructions
await agent.execute("Navigate to products page and filter by 'Electronics'")
await agent.execute("Fill out form with name 'John Doe', email 'john@example.com'")

# BAD: Vague instructions
await agent.execute("Do some stuff on this page")

# Combine agent + traditional methods
# Agent for navigation, extract() for precise data
await agent.execute("Navigate to the search results page")
data = await page.extract("Extract all search results", schema=ResultList)

Complete Example

from stagehand import Stagehand, StagehandConfig
from pydantic import BaseModel, Field
from typing import List
import asyncio
import os
from dotenv import load_dotenv

load_dotenv()

class SearchResult(BaseModel):
    title: str = Field(..., description="Result title")
    url: str = Field(..., description="Result URL")
    snippet: str = Field(..., description="Result description")

class SearchResults(BaseModel):
    results: List[SearchResult] = Field(..., description="Search results")

async def search_and_extract(query: str) -> SearchResults:
    config = StagehandConfig(
        env="BROWSERBASE",
        api_key=os.getenv("BROWSERBASE_API_KEY"),
        project_id=os.getenv("BROWSERBASE_PROJECT_ID"),
        model_name="google/gemini-2.5-flash-preview-05-20",
        model_api_key=os.getenv("MODEL_API_KEY"),
    )
    
    async with Stagehand(config) as stagehand:
        page = stagehand.page
        
        # Navigate
        await page.goto("https://www.google.com")
        
        # Use observe to plan, then act
        search_box = await page.observe("Find the search input")
        await page.act(search_box[0])
        await page.act(f"Type '{query}' and press Enter")
        
        # Wait for results
        await page.wait_for_load_state("networkidle")
        
        # Extract structured data
        results = await page.extract(
            "Extract the top 5 search results",
            schema=SearchResults
        )
        
        return results

if __name__ == "__main__":
    results = asyncio.run(search_and_extract("python web scraping"))
    for r in results.results:
        print(f"- {r.title}: {r.url}")

Anti-Patterns to Avoid

1. Multi-Step Actions in Single act() Call

# BAD: Multiple steps
await page.act("Sign in, go to settings, and change password")

# GOOD: Atomic steps
await page.act("Click the sign in button")
await page.act("Type 'user@email.com' into email field")
await page.act("Type 'password123' into password field")
await page.act("Click submit")

2. Missing Schemas for Structured Data

# BAD: Unstructured extraction (returns string)
data = await page.extract("Get all product info")

# GOOD: Schema-based extraction (returns validated model)
data = await page.extract("Get all product info", schema=ProductList)

3. Not Using observe() for Complex Pages

# BAD: Direct action on dynamic page
await page.act("Click the submit button")  # Which one?

# GOOD: Observe first to ensure correct element
results = await page.observe("Find the main form submit button")
await page.act(results[0])

4. Forgetting Async Context Manager

# BAD: Manual init/close (error-prone)
stagehand = Stagehand(config)
await stagehand.init()
# ... if exception here, close() never called
await stagehand.close()

# GOOD: Context manager (always cleans up)
async with Stagehand(config) as stagehand:
    page = stagehand.page
    # ... exceptions handled, cleanup guaranteed

5. Blocking I/O in Async Code

# BAD: Blocking sleep
import time
time.sleep(5)

# GOOD: Async sleep
import asyncio
await asyncio.sleep(5)

# GOOD: Use Playwright's wait methods
await page.wait_for_load_state("networkidle")
await page.wait_for_selector(".results")

File Structure Best Practices

project/
├── .env                    # API keys (never commit)
├── .env.example            # Template for env vars
├── main.py                 # Entry point
├── extractors/
│   └── schemas.py          # Pydantic schemas for extraction
├── workflows/
│   ├── search.py           # Search workflow
│   └── form_fill.py        # Form filling workflow
└── utils/
    └── config.py           # Stagehand config factory

Checklist Before Writing Stagehand Code

Configuration: Are environment variables set (API keys, project ID)?
Context Manager: Am I using async with Stagehand() for cleanup?
Atomic Actions: Are act() calls single, specific actions?
Schemas: Am I using Pydantic models for extract()?
Observe: Should I observe() first on complex/dynamic pages?
Agent vs act(): Is this a multi-step workflow (agent) or single action (act)?
Error Handling: Am I using try/except for network/page errors?
Async: Is all I/O using async/await (no blocking calls)?

stagehand

Install Skill

SKILL.md