| name | using-browser |
| description | Use when the user asks to browse websites, navigate web pages, extract data from sites, interact with web forms, search for online information, test web applications, or automate any web-based task. Trigger on requests like "go to website X", "search for Y on the web", "find Z online", "fill out this form", "get data from this page", or any task requiring a web browser. |
Browser Automation Skill
⚠️ MANDATORY WORKFLOW:
- Start daemon with initial context:
scripts/browser-daemon --initial-context-name <name> [--initial-context-url <url>]- Delegate to browser-agent with context assignment
- Create additional contexts if needed:
scripts/browser-cli create-browsing-context <name>- Close extra contexts when done (optional)
- Stop daemon ONLY when user explicitly requests:
scripts/browser-cli quit
What are Browsing Contexts?
Browsing contexts are named browser tabs with full action history. Each context:
- Has a unique name (e.g., "shopping", "research", "testing")
- Maintains complete history of all actions with intentions
- Persists across browser-agent invocations
- Enables parallel multi-tab automation
Your Responsibilities (Main Agent)
1. Task Planning (REQUIRED FIRST STEP)
Before ANY browser work, you MUST create a high-level plan using TodoWrite.
This plan decomposes the user's request into actionable steps that browser-agents can execute. Plans evolve as you discover new information about the pages you're navigating.
Initial Planning
When the user makes a request, immediately create a high-level plan:
Example: "Get details of the latest blog post"
TodoWrite:
1. Navigate to main page and understand layout
2. Look for link to blog posts list
3. Find the latest blog post link
4. Navigate to the blog post
5. Extract blog post details (title, date, content, author)
Example: "Find all products in the Electronics category"
TodoWrite:
1. Navigate to site and find categories
2. Click into Electronics category
3. Understand page structure (pagination? infinite scroll?)
4. Extract product list from current page
5. Handle additional pages if needed
Evolving the Plan
As browser-agents return information, UPDATE your plan to reflect new understanding:
Discovery: Pagination exists
Original plan step: "Extract all products"
Updated plan:
3a. Understand pagination structure (how many pages?)
3b. Extract products from page 1
3c. Generate reusable extraction script
3d. Navigate through remaining pages applying script
Discovery: Search functionality available
Original plan: "Navigate through categories to find product X"
Updated plan:
1. ~~Navigate through categories~~ (SKIP - search available)
2. Use search box to find product X directly
3. Extract product details
Discovery: Data requires detail pages
Original plan step: "Extract product specs from list"
Updated plan:
3a. Extract product links from list page
3b. Navigate to first product detail page
3c. Extract specs from detail page
3d. Return to list and repeat for remaining products
Plan Management Rules
- ✅ Create plan BEFORE starting daemon - Know your strategy first
- ✅ Mark tasks as in_progress when delegating to browser-agent
- ✅ Mark tasks as completed immediately after browser-agent returns success
- ✅ Add new tasks when you discover additional steps needed
- ✅ Update task descriptions when approach changes based on page structure
- ✅ Keep plan high-level - Browser-agent handles low-level element finding
2. Daemon Lifecycle
- ✅ Start the daemon with initial browsing context name (required)
- ✅ Optionally provide initial URL (defaults to about:blank)
- ✅ Stop the daemon ONLY when user explicitly requests to close the browser
3. Browsing Context Management
- ✅ Initial context is created automatically when daemon starts
- ✅ Create additional browsing contexts if needed for parallel work
- ✅ Assign contexts to browser-agents in prompts
- ✅ Monitor contexts via status command
- ✅ Close extra contexts when no longer needed (daemon quit closes all)
4. Task Delegation
- ✅ Spawn browser-agents with clear assignments
- ✅ Provide scripts path and browsing context name
- ✅ Delegate operations, not lifecycle management
Daemon Lifecycle
Start Daemon (Step 1)
ALWAYS start the daemon with an initial browsing context:
scripts/browser-daemon --initial-context-name <name> [--initial-context-url <url>]
Parameters:
--initial-context-name(required) - Name for the initial browsing context--initial-context-url(optional) - URL to navigate to (defaults to about:blank)
Examples:
# Start with blank context
scripts/browser-daemon --initial-context-name main
# Start and navigate to URL
scripts/browser-daemon --initial-context-name shopping --initial-context-url https://amazon.com
This:
- Launches Chrome with remote debugging
- Connects via Chrome DevTools Protocol
- Creates the initial browsing context
- Navigates to URL if provided
- Listens on Unix socket for commands
Note: The PreToolUse hook automatically runs this in the background.
Check Status
scripts/browser-cli status
Returns:
- Daemon status (running/stopped)
- All browsing contexts with:
- Name, URL, title
- Age (time since creation)
- Recent history (last 5 actions with intentions)
Stop Daemon
IMPORTANT: Only stop the daemon when the user explicitly requests to close the browser.
DO NOT automatically stop the daemon after completing tasks. The browser should remain open for potential follow-up work unless the user specifically asks to close it.
To stop the daemon (only when explicitly requested):
scripts/browser-cli quit
This:
- Closes all browsing contexts
- Shuts down Chrome
- Cleans up resources
Browsing Context Lifecycle
Initial Context
The initial browsing context is created automatically when you start the daemon with --initial-context-name.
You can use this context immediately for delegating work to browser-agents.
Create Additional Browsing Contexts (Optional)
Only needed for parallel multi-tab automation:
scripts/browser-cli create-browsing-context <name> [--url <initial-url>]
Examples:
# Create blank context
scripts/browser-cli create-browsing-context research
# Create and navigate to URL
scripts/browser-cli create-browsing-context comparison --url https://ebay.com
Naming guidelines:
- Use descriptive names: "shopping", "research", "admin-panel"
- Avoid generic names: "tab1", "context2"
- Names help you and browser-agents understand purpose
Monitor Contexts
Check what's happening across all contexts:
scripts/browser-cli status
Example output:
{
"status": "running",
"connected_to_chrome": true,
"browsing_contexts": [
{
"name": "shopping",
"url": "https://amazon.com/cart",
"title": "Shopping Cart",
"age_minutes": 5.2,
"recent_history": [
{"action": "create", "intention": "Starting shopping session", "result": "OK"},
{"action": "navigate", "intention": "Going to Amazon", "result": "OK"},
{"action": "type", "intention": "Searching for laptop", "result": "OK"},
{"action": "click", "intention": "Opening first result", "result": "OK"}
]
}
]
}
Close Browsing Context
When a context is no longer needed:
scripts/browser-cli close-browsing-context <name>
When to close:
- ✅ Task is complete and context won't be reused
- ✅ Cleaning up after multi-step workflow
- ✅ Before stopping daemon (optional - daemon quit closes all)
Delegating to Browser-Agent
⚠️ CRITICAL: Task Decomposition Before Delegation
YOU must decompose multi-page requests into single-page tasks.
Browser-agent works on ONE PAGE at a time. When a user asks for work spanning multiple pages:
- YOU break it down into individual page tasks
- YOU orchestrate navigation between pages
- Browser-agent executes ONE task on ONE page per call
WRONG: "Go to Amazon, click 3 products, extract specs from each" RIGHT: Separate calls: "Extract product links" → "Navigate to link 1" → "Extract specs" → "Navigate to link 2" → ...
Recognizing Bulk/Repetitive Queries
Trigger phrases that require the Repetitive Tasks pattern:
- "find all...", "get all...", "extract all..."
- "list of...", "every...", "each..."
- "scrape...", "collect..."
When you see these, ALWAYS:
- First: Ask agent to explore page structure
- Then: Check if data is available on current page
- If single-page: Use script generation pattern
- If multi-page needed: Agent will report infeasibility
Handling Infinite Scroll:
If agent reports infinite scroll, YOU decide how much to load:
# Agent reported: "Page uses infinite scroll. Currently 20 items visible."
# Option 1: Extract what's visible
Task(prompt="Extract all currently visible products")
# Option 2: Load more, then extract
for _ in range(3): # Load 3 more batches
Task(prompt="Scroll to bottom and wait for content")
Task(prompt="Take snapshot --diff to see new items")
Task(prompt="Extract all products now visible")
# Option 3: Load until target count
while extracted_count < 100:
Task(prompt="Scroll to bottom, extract new products")
⚠️ CRITICAL: Ask Questions Before Giving Instructions
YOU must explore the page by asking questions before giving specific instructions.
Don't make assumptions about page structure. Use the agent as your eyes first:
WRONG: Assume structure and give blind instructions
Task(prompt="Click the 'Sign In' button in the top-right corner")
# Fails if button is labeled differently or in a different location
RIGHT: Ask questions first, then give specific instructions
# Step 1: Explore
Task(prompt="Is there a sign-in or login button on this page? Where is it located?")
# Returns: "Yes, there's a 'Log In' link in the header navigation"
# Step 2: Act based on what you learned
Task(prompt="Click the 'Log In' link in the header")
Benefits:
- ✅ Agents don't have to explore on their own - you direct their focus
- ✅ You avoid making wrong assumptions about page structure
- ✅ You get information to make better decisions about next steps
- ✅ Reduces wasted agent calls and token usage
Basic Pattern
CRITICAL: Always provide:
- Scripts path - Where browser-cli lives
- Browsing context - Which tab to work in
- Single-page task - ONE operation on the CURRENT page only
Task(
description="Extract product info",
subagent_type="superpowers:browser-agent",
model="haiku",
prompt=f"""Scripts path: ${{CLAUDE_PLUGIN_ROOT}}/skills/using-browser/scripts
Browsing context: shopping
Extract the first 3 product titles and prices from this page."""
)
Why Use Browser-Agent?
- Context efficiency - Returns only requested data in natural language
- Cost savings - Haiku model for browser operations
- Action history - All actions recorded with intentions for debugging
- Natural communication - No CSS selectors or technical jargon in responses
- Resumability - Can resume from previous work using context history
Bounded Exploration Model
Browser-agent is bounded to the CURRENT PAGE ONLY:
- ✅ Explores current page freely (find elements, answer questions, extract data)
- ✅ Performs compound actions on current page ("search for X and extract Y")
- ✅ Executes navigation when YOU explicitly tell it to
- ❌ Does NOT make navigation decisions
- ❌ Does NOT accept multi-page tasks
If you send a multi-page task, the agent SHOULD reject it. This is by design.
Orchestrating Browser Tasks
You are the brain. Browser-agent is your eyes and hands for ONE PAGE at a time.
Pattern: Use Agent as Eyes
When you don't know page structure:
Task(prompt="Is there a search box on this page?")
# Returns: "Yes, there's a search box at the top labeled 'Search Amazon'"
Task(prompt="What navigation links are available?")
# Returns: "I see links for: Products, About, Contact, Login"
Task(prompt="Find the login button")
# Returns: "Found login button in the top-right corner"
Pattern: Give Specific Instructions
When you know what to do, give compound instructions that work on current page:
# Navigation (if needed)
Task(prompt="Navigate to amazon.com")
# Compound actions on current page
Task(prompt="Search for 'laptop' and extract the first 5 product titles and prices")
# Or break into logical steps
Task(prompt="Type 'laptop' in the search box and click search")
Task(prompt="Wait for results and extract first 5 products with prices")
Pattern: Repetitive Tasks (Script Generation + Distribution)
For repetitive extraction (same data from multiple pages):
- Get list of targets:
Task(prompt="Extract all category names and links from this page")
# Returns: ["Electronics: /electronics", "Books: /books", "Clothing: /clothing"]
- Generate reusable script (agent explores examples, YOU navigate between them):
# Navigate to first example
Task(prompt="Navigate to /electronics")
Task(prompt="Explore the page structure and manually extract product names and prices. Document what you find.")
# Returns: Extracted products using selectors .product-card h3 and .product-card .price
# Navigate to second example
Task(prompt="Navigate to /books")
Task(prompt="Extract products using the same approach. Identify what's constant vs variable.")
# Returns: Same selectors work. Constant: selectors. Variable: data values.
# Get validated script
Task(prompt="Create and validate a reusable eval script for extracting products")
# Returns: Script + validation results
- Distribute script to remaining pages (you iterate):
for category in remaining_categories:
Task(prompt=f"""Navigate to {category['link']}
Run this eval script: {script}
Return the extracted data""")
This keeps script generation intelligence in the agent while orchestration stays with you.
Error Handling with Retries
If step fails, retry up to 3 times:
- Retry 1: Same command (transient failure)
- Retry 2: Ask agent to explore: "Find the search input, it might be labeled differently"
- Retry 3: Alternative approach
After 3 failures: Report to user with context and last error.
Complete Workflow Examples
Example 1: Bulk Extraction with Script Generation (with Planning)
User: "Get all product prices from the site"
# STEP 0: Create initial plan (REQUIRED)
TodoWrite([
{"content": "Navigate to site and understand structure", "status": "pending", "activeForm": "Navigating to site and understanding structure"},
{"content": "Check if pagination or infinite scroll exists", "status": "pending", "activeForm": "Checking if pagination or infinite scroll exists"},
{"content": "Extract products from first page", "status": "pending", "activeForm": "Extracting products from first page"},
{"content": "Handle additional pages if needed", "status": "pending", "activeForm": "Handling additional pages if needed"}
])
# STEP 1: Mark first task in_progress and execute
TodoWrite([
{"content": "Navigate to site and understand structure", "status": "in_progress", ...},
# ... rest unchanged
])
scripts/browser-daemon --initial-context-name shopping --initial-context-url https://shop.example.com
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: shopping
What products are visible on this page? Describe the overall structure.""")
# Returns: "I see 20 products with names and prices displayed in a grid."
# Mark completed, move to next
TodoWrite([
{"content": "Navigate to site and understand structure", "status": "completed", ...},
{"content": "Check if pagination or infinite scroll exists", "status": "in_progress", ...},
# ... rest
])
# STEP 2: Check pagination
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: shopping
Is there pagination? If so, how many pages?""")
# Returns: "Yes, pagination shows 5 more pages (6 total). Page numbers at bottom."
# UPDATE PLAN based on discovery - pagination exists!
TodoWrite([
{"content": "Navigate to site and understand structure", "status": "completed", ...},
{"content": "Check if pagination or infinite scroll exists", "status": "completed", ...},
{"content": "Extract products from page 1 and generate reusable script", "status": "in_progress", ...},
{"content": "Navigate to page 2 and extract using script", "status": "pending", ...},
{"content": "Navigate to pages 3-6 and extract using script", "status": "pending", ...},
{"content": "Compile all results", "status": "pending", ...}
])
# STEP 3: Extract from page 1 and create script
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: shopping
Extract all product names and prices from THIS page. Create a reusable eval script I can run on subsequent pages.""")
# Returns:
# - Extracted data: [{"name": "Product 1", "price": "$29.99"}, ...]
# - Script: "() => [...document.querySelectorAll('.product')].map(p => ({name: p.querySelector('h3').textContent, price: p.querySelector('.price').textContent}))"
script = "() => [...document.querySelectorAll('.product')].map(p => ({name: p.querySelector('h3').textContent, price: p.querySelector('.price').textContent}))"
# Mark completed, move to next page
TodoWrite([
# ... previous completed
{"content": "Extract products from page 1 and generate reusable script", "status": "completed", ...},
{"content": "Navigate to page 2 and extract using script", "status": "in_progress", ...},
# ... rest
])
# STEP 4: Navigate and extract from page 2
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: shopping
Navigate to page 2""")
Task(prompt=f"""Scripts path: ${{CLAUDE_PLUGIN_ROOT}}/skills/using-browser/scripts
Browsing context: shopping
Run this extraction script: {script}""")
# Returns: [{"name": "Product 21", "price": "$19.99"}, ...]
# Mark completed, continue with remaining pages
# ... repeat for pages 3-6
Key points:
- ✅ Created plan BEFORE starting - Knew general strategy upfront
- ✅ Evolved plan after discovering pagination - Added specific page navigation tasks
- ✅ Marked tasks in_progress/completed - Tracked progress throughout
- ✅ High-level planning - Browser-agent handles element finding details
Example 2: Simple Search Task
User: "Find laptop prices on Amazon"
# 1. Start daemon
scripts/browser-daemon --initial-context-name amazon --initial-context-url https://amazon.com
# 2. Search for laptops
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: amazon
Type 'laptop' in the search box and click search""")
# 3. Extract results
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: amazon
Wait for results to load, then extract first 5 products with prices""")
# Returns: ["Dell Laptop - $599", "HP Pavilion - $649", ...]
# 4. Present to user
"Found 5 laptops on Amazon: Dell - $599, HP - $649, ..."
# Browser stays open for follow-up work
# Only quit if user explicitly asks: scripts/browser-cli quit
Example 2: Unknown Page Structure
User: "Check if this page has a login form"
# Already on some page in browsing context "research"
# Ask agent to explore current page
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: research
Is this a login page? Look for username/password fields""")
# Returns: "Yes, this is a login page. I see email and password inputs, plus a 'Sign In' button"
# Now you know what to do - fill and submit in one action
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: research
Fill in email 'user@example.com' and password 'password123', then click Sign In""")
# Browser stays open for follow-up work
Example 3: Parallel Multi-Tab Automation
User: "Compare laptop prices on Amazon, eBay, and Walmart"
# 1. Start daemon with initial context
scripts/browser-daemon --initial-context-name amazon --initial-context-url https://amazon.com
# 2. Create additional contexts for parallel work
scripts/browser-cli create-browsing-context ebay --url https://ebay.com
scripts/browser-cli create-browsing-context walmart --url https://walmart.com
# 3. Execute search on each site and extract prices
# (Can run in parallel by making multiple Task calls in single message)
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: amazon
Search for 'laptop' and extract top 3 prices""")
# Returns: ["$599", "$649", "$549"]
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: ebay
Search for 'laptop' and extract top 3 prices""")
# Returns: ["$550", "$620", "$510"]
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: walmart
Search for 'laptop' and extract top 3 prices""")
# Returns: ["$579", "$639", "$529"]
# 4. Compare and present
"Price comparison:
Amazon: $599, $649, $549 (avg: $599)
eBay: $550, $620, $510 (avg: $560)
Walmart: $579, $639, $529 (avg: $582)
eBay has the lowest average price."
# Browser stays open for follow-up work
Browsing Context History
Browser-agents can check what happened in the same context:
# You navigate to Wikipedia
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: research
Navigate to wikipedia.org""")
# You tell agent to search
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: research
Type 'Artificial Intelligence' in the search box and click search""")
# Later, you ask agent to continue
Task(prompt="""Scripts path: ${CLAUDE_PLUGIN_ROOT}/skills/using-browser/scripts
Browsing context: research
Click on the 'History of AI' section and extract the first paragraph""")
# Agent checks browsing-context-history, sees previous actions, knows it's already on AI article
The agent uses context history to understand:
- What page it's currently on
- What actions have been performed
- What the current state is
When to Create Multiple Contexts
✅ Create Multiple Contexts When:
- Comparing data across different websites (price comparison)
- Parallel data extraction from independent sources
- Multi-account workflows (admin panel + user view)
- Keeping reference pages open while working elsewhere
❌ Use Single Context When:
- Linear workflow on one website
- Simple search-and-extract tasks
- Related pages on the same site
- No need for parallelism
Critical Rules
⚠️ Daemon Lifecycle
- START daemon with initial context name (required parameter)
- PROVIDE initial URL (optional, defaults to about:blank)
- STOP daemon ONLY when user explicitly requests - Do NOT auto-stop after tasks
- AUTO-SHUTDOWN when user closes browser - Daemon detects Chrome exit and stops automatically
- One daemon serves all browsing contexts
⚠️ Browsing Context Management
- INITIAL context created automatically when daemon starts
- CREATE additional contexts only for parallel multi-tab work
- ASSIGN context to each browser-agent in their prompt
- CLOSE extra contexts when no longer needed (optional - quit closes all)
- DON'T reuse context names - each should be unique
⚠️ Browser-Agent Delegation
- ALWAYS provide scripts path
- ALWAYS assign a browsing context
- Browser-agent works WITHIN assigned context
- Browser-agent does NOT create/close contexts
Troubleshooting
"Required argument --initial-context-name" error
- Forgot to provide initial context name when starting daemon
- Fix: Always use
scripts/browser-daemon --initial-context-name <name>
"Browsing context not found" error
- Using a context name that doesn't exist
- Fix: Check
scripts/browser-cli statusto see available contexts - Or create the context with
scripts/browser-cli create-browsing-context <name>
"Browsing context already exists" error
- Trying to create duplicate context
- Fix: Use a different name or close the existing one first
Browser-agent can't connect
- Daemon not running
- Fix: Start daemon with
scripts/browser-daemon --initial-context-name <name>
Contexts accumulating
- Not closing extra contexts after parallel tasks
- Fix: Either close contexts individually or just quit daemon (closes all)
Notes
- Browsing contexts persist until explicitly closed or daemon stops
- Each context has independent browser state (cookies, localStorage, etc.)
- Actions are logged with timestamps and intentions for full traceability
- Status command shows real-time view of all contexts
- Browser-agent can check context history to understand previous work
- Daemon automatically shuts down if user closes the browser window
- Browser stays open after tasks complete to allow follow-up work