| name | Kai Browser Skill |
| pack-id | danielmiessler-kai-browser-skill-v1.1.0 |
| version | 1.1.0 |
| author | danielmiessler |
| description | Code-first browser automation replacing Playwright MCP with 99% token savings. Navigate, screenshot, interact, and verify web pages. |
| type | skill |
| purpose-type | browser-automation, verification, testing |
| platform | claude-code |
| dependencies | |
| keywords | browser, playwright, automation, screenshot, web-testing, verification, file-based-mcp |
Kai Browser Skill (kai-browser-skill)
Code-first browser automation with 99% token savings over Playwright MCP
Installation Prompt
You are receiving a PAI Pack - a modular upgrade for AI agent systems.
What is PAI? See: PAI Project Overview
What is a Pack? See: Pack System
🚨 CRITICAL: AI Agent Installation Instructions
You are an AI agent installing this pack. Follow these steps EXACTLY:
Step 0 - Pre-Installation Analysis (MANDATORY)
- Detect the user's operating system and environment
- Check for existing Browser/Playwright/Browse skills
- If conflicts found, ASK the user before proceeding
- Create backup if replacing existing installation
Copy ALL Files Exactly
- DO NOT simplify, summarize, or "improve" any files
- Copy the EXACT content of each file
- Preserve all directory structure
Verify Installation
- Run ALL 8 verification checks from VERIFY.md
- DO NOT report success until ALL checks pass
- If any check fails, troubleshoot before proceeding
Report Results
- Show user which checks passed/failed
- Confirm final installation status
Anti-Simplification Directive: This pack contains carefully crafted code and documentation. Do not attempt to simplify, condense, or "improve" any files during installation. Copy them EXACTLY as provided.
This Pack adds browser automation capabilities to your AI infrastructure. The Kai Browser Skill provides:
- Navigation: Go to URLs, navigate history, reload pages
- Screenshots: Capture full pages or specific elements
- Interaction: Click, fill forms, select dropdowns, type text
- Verification: Wait for elements, check content, validate pages
- Extraction: Get visible text, HTML, accessibility tree
Core principle: Replace token-heavy MCPs with code-first execution.
The Playwright MCP loads ~13,700 tokens at startup whether you use it or not. This skill loads ~50-200 tokens per operation, only when needed. That's 99%+ token savings.
Please follow the installation instructions below to integrate this Pack into your infrastructure.
Quick Navigation: What's Included | The Problem | The Solution | Installation | Troubleshooting | Configuration | Changelog
What's Included
| Component | File | Purpose |
|---|---|---|
| API Wrapper | src/index.ts |
PlaywrightBrowser class with full API |
| CLI Tool | Tools/Browse.ts |
Command-line browser operations |
| Skill Definition | SKILL.md |
Claude Code skill integration |
| Screenshot Workflow | Workflows/Screenshot.md |
Screenshot capture workflow |
| Verify Workflow | Workflows/VerifyPage.md |
Page verification workflow |
| Interact Workflow | Workflows/Interact.md |
Form interaction workflow |
| Extract Workflow | Workflows/Extract.md |
Content extraction workflow |
| Example Scripts | examples/*.ts |
Ready-to-run examples |
Summary:
- Files created: 10+
- Workflows: 4
- Dependencies: playwright
The Concept and/or Problem
Reference: Anthropic's engineering blog post Code execution with MCP: Building more efficient agents details how file-based MCPs can reduce token usage by 98%+.
AI agents need browser automation for:
- Verifying web deployments actually work
- Taking screenshots for documentation
- Interacting with web forms
- Extracting content from pages
- Testing user interfaces
The MCP Problem:
The Playwright MCP (Model Context Protocol) is the standard approach:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
But it has significant overhead:
- ~13,700 tokens loaded at startup - whether you use it or not
- Limited to 21 predefined tools - can't access full Playwright API
- MCP protocol overhead - extra latency on every operation
The Token Cost:
Even if you only take one screenshot per session:
- MCP: 13,700 tokens (loaded at start)
- Actual operation: ~100 tokens
- Total: 13,800 tokens
For a session with 5 screenshots:
- MCP: Still 13,700 tokens
- Operations: 5 x ~100 = 500 tokens
- Total: 14,200 tokens
The Solution
File-based MCP - code-first execution with on-demand loading, as described in Anthropic's MCP code execution approach:
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/index.ts'
const browser = new PlaywrightBrowser()
await browser.launch()
await browser.navigate('https://example.com')
await browser.screenshot({ path: 'screenshot.png' })
await browser.close()
Benefits:
| Metric | Playwright MCP | Browser Skill |
|---|---|---|
| Startup cost | ~13,700 tokens | 0 tokens |
| Per-operation | ~0 | ~50-200 tokens |
| API access | 21 tools | Full Playwright API |
| Latency | MCP overhead | Direct execution |
Token Savings Example:
5 screenshots per session:
- MCP: 13,700 tokens
- Browser Skill: 5 x ~100 = 500 tokens
- Savings: 96%
Installation
Prerequisites
- Bun runtime:
curl -fsSL https://bun.sh/install | bash - macOS, Linux, or Windows
Step 1: Create Directory
mkdir -p $PAI_DIR/skills/Browser
cd $PAI_DIR/skills/Browser
Step 2: Copy Files
From the kai-browser-skill/ pack directory:
cp src/index.ts $PAI_DIR/skills/Browser/
cp package.json $PAI_DIR/skills/Browser/
cp tsconfig.json $PAI_DIR/skills/Browser/
cp SKILL.md $PAI_DIR/skills/Browser/
mkdir -p $PAI_DIR/skills/Browser/Tools
mkdir -p $PAI_DIR/skills/Browser/Workflows
mkdir -p $PAI_DIR/skills/Browser/examples
cp Tools/Browse.ts $PAI_DIR/skills/Browser/Tools/
cp Workflows/*.md $PAI_DIR/skills/Browser/Workflows/
cp examples/*.ts $PAI_DIR/skills/Browser/examples/
Step 3: Install Dependencies
cd $PAI_DIR/skills/Browser
bun install
bunx playwright install chromium
Step 4: Verify
bun $PAI_DIR/skills/Browser/examples/verify-page.ts https://example.com
Expected output: "Page loaded: Example Domain"
Usage
From Claude Code
Just ask:
Navigate to danielmiessler.com and take a screenshot
From TypeScript
import { PlaywrightBrowser } from '$PAI_DIR/skills/Browser/index.ts'
const browser = new PlaywrightBrowser()
await browser.launch()
await browser.navigate('https://example.com')
await browser.screenshot({ path: 'screenshot.png' })
await browser.close()
From CLI
# Open in visible browser
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts open https://example.com
# Take screenshot
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts screenshot https://example.com /tmp/shot.png
# Verify element
bun run $PAI_DIR/skills/Browser/Tools/Browse.ts verify https://example.com "h1"
API Reference
Navigation
launch(options?)- Start browsernavigate(url)- Go to URLgoBack()/goForward()- Historyreload()- Refresh pageclose()- Shut down
Capture
screenshot(options?)- Take screenshotgetVisibleText(selector?)- Extract textgetVisibleHtml(options?)- Get HTMLsavePdf(path)- Export PDFgetAccessibilityTree()- A11y snapshot
Interaction
click(selector)- Click elementfill(selector, value)- Fill inputtype(selector, text)- Type with delayselect(selector, value)- Select dropdownpressKey(key)- Keyboard inputhover(selector)- Mouse hoverdrag(source, target)- Drag and dropuploadFile(selector, path)- File upload
Waiting
waitForSelector(selector)- Wait for elementwaitForNavigation()- Wait for page loadwaitForNetworkIdle()- Wait for idlewait(ms)- Fixed delay
JavaScript
evaluate(script)- Run JSgetConsoleLogs()- Get console outputsetUserAgent(ua)- Change user agent
Viewport
resize(width, height)- Set sizesetDevice(name)- Emulate device
Troubleshooting
"Cannot find module 'playwright'"
cd $PAI_DIR/skills/Browser
bun install
"Executable doesn't exist"
bunx playwright install chromium
Screenshot is blank
Wait for page to load:
await browser.waitForNetworkIdle()
await browser.screenshot({ path: 'shot.png' })
Configuration
Launch Options
await browser.launch({
browser: 'chromium', // 'chromium' | 'firefox' | 'webkit'
headless: true, // false to see browser
viewport: { width: 1280, height: 720 },
userAgent: 'Custom UA'
})
Screenshot Options
await browser.screenshot({
path: 'screenshot.png',
fullPage: true, // Capture entire page
selector: '#element', // Capture specific element
type: 'png', // 'png' | 'jpeg'
quality: 80 // JPEG quality (0-100)
})
Changelog
v1.1.0 (2026-01-03)
- CLI-first redesign - Documentation now emphasizes CLI tool over TypeScript API
- Added "STOP - CLI First, Always" anti-pattern section
- Added decision tree for CLI vs TypeScript routing
- VERIFY phase examples now use CLI commands
- TypeScript API renamed to "Advanced" and moved to bottom
- Clearer guidance on when to use each approach
v1.0.0 (2026-01-03)
- Initial release
- Merged from Browse + Playwright skills
- Full Playwright API wrapper
- CLI tool with open/screenshot/verify commands
- 4 workflow definitions
- Example scripts