| name | dev-browser |
| description | Browser automation with persistent page state. Use when users ask to navigate websites, fill forms, take screenshots, extract web data, test web apps, or automate browser workflows. Trigger phrases include "go to [url]", "click on", "fill out the form", "take a screenshot", "scrape", "automate", "test the website", "log into", or any browser interaction request. |
Dev Browser Skill
Browser automation that maintains page state across script executions. Write small, focused scripts to accomplish tasks incrementally. Once you've proven out part of a workflow and there is repeated work to be done, you can write a script to do the repeated work in a single execution.
Choosing Your Approach
Local/source-available sites: If you have access to the source code (e.g., localhost or project files), read the code first to write selectors directly—no need for multi-script discovery.
Unknown page layouts: If you don't know the structure of the page, use getAISnapshot() to discover elements and selectSnapshotRef() to interact with them. The ARIA snapshot provides semantic roles (button, link, heading) and stable refs that persist across script executions.
Visual feedback: Take screenshots to see what the user sees and iterate on design or debug layout issues.
Setup
First, start the dev-browser server using the startup script:
./skills/dev-browser/server.sh &
The script will automatically install dependencies and start the server. It will also install Chromium on first run if needed.
Flags
The server script accepts the following flags:
--headless- Start the browser in headless mode (no visible browser window). Use if the user asks for it.
Wait for the Ready message before running scripts. On first run, the server will:
- Install dependencies if needed
- Download and install Playwright Chromium browser
- Create the
tmp/directory for scripts - Create the
profiles/directory for browser data persistence
The first run may take longer while dependencies are installed. Subsequent runs will start faster.
Important: Scripts must be run with bun x tsx (not bun run) due to Playwright WebSocket compatibility.
The server starts a Chromium browser with a REST API for page management (default: http://localhost:9222).
How It Works
- Server launches a persistent Chromium browser and manages named pages via REST API
- Client connects to the HTTP server URL and requests pages by name
- Pages persist - the server owns all page contexts, so they survive client disconnections
- State is preserved - cookies, localStorage, DOM state all persist between runs
Writing Scripts
Execute scripts inline using heredocs—no need to write files for one-off automation.
CRITICAL: Always run scripts from
skills/dev-browser/Scripts must be executed from the
skills/dev-browser/directory. The@/import alias (e.g.,@/client.js) is configured in this directory'stsconfig.jsonandpackage.json. Running from any other directory will fail with:ERR_MODULE_NOT_FOUND: Cannot find package '@/client.js'
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect } from "@/client.js";
const client = await connect();
const page = await client.page("main");
// Your automation code here
await client.disconnect();
EOF
Only write to tmp/ files when:
- The script needs to be reused multiple times
- The script is complex and you need to iterate on it
- The user explicitly asks for a saved script
Basic Template
Use the @/client.js import path for all scripts.
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect();
const page = await client.page("main"); // get or create a named page
await page.setViewportSize({ width: 1280, height: 800 }); // Required for screenshots
// Your automation code here
await page.goto("https://example.com");
await waitForPageLoad(page); // Wait for page to fully load
// Always evaluate state at the end
const title = await page.title();
const url = page.url();
console.log({ title, url });
// Disconnect so the script exits (page stays alive on the server)
await client.disconnect();
EOF
Key Principles
- Small scripts: Each script should do ONE thing (navigate, click, fill, check)
- Evaluate state: Always log/return state at the end to decide next steps
- Use page names: Use descriptive names like
"checkout","login","search-results" - Disconnect to exit: Call
await client.disconnect()at the end of your script so the process exits cleanly. Pages persist on the server. - Plain JS in evaluate: Always use plain JavaScript inside
page.evaluate()callbacks—never TypeScript. The code runs in the browser which doesn't understand TS syntax.
Important Notes
- tsx runs without type-checking: Scripts run with
bun x tsxwhich transpiles TypeScript but does NOT type-check. Type errors won't prevent execution—they're just ignored. - No TypeScript in browser context: Code passed to
page.evaluate(),page.evaluateHandle(), or similar methods runs in the browser. Use plain JavaScript only:
// ✅ Correct: plain JavaScript in evaluate
const text = await page.evaluate(() => {
return document.body.innerText;
});
// ❌ Wrong: TypeScript syntax in evaluate (will fail at runtime)
const text = await page.evaluate(() => {
const el: HTMLElement = document.body; // TS syntax - don't do this!
return el.innerText;
});
Workflow Loop
Follow this pattern for complex tasks:
- Write a script to perform one action
- Run it and observe the output
- Evaluate - did it work? What's the current state?
- Decide - is the task complete or do we need another script?
- Repeat until task is done
Client API
const client = await connect();
const page = await client.page("name"); // Get or create named page
const pages = await client.list(); // List all page names
await client.close("name"); // Close a page
await client.disconnect(); // Disconnect (pages persist)
// ARIA Snapshot methods for element discovery and interaction
const snapshot = await client.getAISnapshot("name"); // Get ARIA accessibility tree
const element = await client.selectSnapshotRef("name", "e5"); // Get element by ref
The page object is a standard Playwright Page—use normal Playwright methods.
Waiting
Use waitForPageLoad(page) after navigation (checks document.readyState and network idle):
import { waitForPageLoad } from "@/client.js";
// Preferred: Wait for page to fully load
await waitForPageLoad(page);
// Wait for specific elements
await page.waitForSelector(".results");
// Wait for specific URL
await page.waitForURL("**/success");
Inspecting Page State
Screenshots
Take screenshots when you need to visually inspect the page:
await page.screenshot({ path: "tmp/screenshot.png" });
await page.screenshot({ path: "tmp/full.png", fullPage: true });
ARIA Snapshot (Element Discovery)
Use getAISnapshot() when you don't know the page layout and need to discover what elements are available. It returns a YAML-formatted accessibility tree with:
- Semantic roles (button, link, textbox, heading, etc.)
- Accessible names (what screen readers would announce)
- Element states (checked, disabled, expanded, etc.)
- Stable refs that persist across script executions
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect();
const page = await client.page("main");
await page.goto("https://news.ycombinator.com");
await waitForPageLoad(page);
// Get the ARIA accessibility snapshot
const snapshot = await client.getAISnapshot("main");
console.log(snapshot);
await client.disconnect();
EOF
Example Output
The snapshot is YAML-formatted with semantic structure:
- banner:
- link "Hacker News" [ref=e1]
- navigation:
- link "new" [ref=e2]
- link "past" [ref=e3]
- link "comments" [ref=e4]
- link "ask" [ref=e5]
- link "submit" [ref=e6]
- link "login" [ref=e7]
- main:
- list:
- listitem:
- link "Article Title Here" [ref=e8]
- text: "528 points by username 3 hours ago"
- link "328 comments" [ref=e9]
- contentinfo:
- textbox [ref=e10]
- /placeholder: "Search"
Interpreting the Snapshot
- Roles - Semantic element types:
button,link,textbox,heading,listitem, etc. - Names - Accessible text in quotes:
link "Click me",button "Submit" [ref=eN]- Element reference for interaction. Only assigned to visible, clickable elements[checked]- Checkbox/radio is checked[disabled]- Element is disabled[expanded]- Expandable element (details, accordion) is open[level=N]- Heading level (h1=1, h2=2, etc.)/url:- Link URL (shown as a property)/placeholder:- Input placeholder text
Interacting with Refs
Use selectSnapshotRef() to get a Playwright ElementHandle for any ref:
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect, waitForPageLoad } from "@/client.js";
const client = await connect();
const page = await client.page("main");
await page.goto("https://news.ycombinator.com");
await waitForPageLoad(page);
// Get the snapshot to see available refs
const snapshot = await client.getAISnapshot("main");
console.log(snapshot);
// Output shows: - link "new" [ref=e2]
// Get the element by ref and click it
const element = await client.selectSnapshotRef("main", "e2");
await element.click();
await waitForPageLoad(page);
console.log("Navigated to:", page.url());
await client.disconnect();
EOF
Debugging Tips
- Use getAISnapshot to see what elements are available and their refs
- Take screenshots when you need visual context
- Use waitForSelector before interacting with dynamic content
- Check page.url() to confirm navigation worked
Error Recovery
If a script fails, the page state is preserved. You can:
- Take a screenshot to see what happened
- Check the current URL and DOM state
- Write a recovery script to get back on track
cd skills/dev-browser && bun x tsx <<'EOF'
import { connect } from "@/client.js";
const client = await connect();
const page = await client.page("main");
await page.screenshot({ path: "tmp/debug.png" });
console.log({
url: page.url(),
title: await page.title(),
bodyText: await page.textContent("body").then((t) => t?.slice(0, 200)),
});
await client.disconnect();
EOF