| name | playwright |
| description | Browser automation with Playwright for modern web scraping. Use this skill for scraping JavaScript-rendered pages, handling complex interactions, managing multiple browser contexts, or testing web applications. NOT needed for static HTML parsing or processing already-fetched content. |
Playwright
Modern browser automation for web scraping and testing.
Installation
pip install playwright
playwright install chromium
Quick Start
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
page.goto('https://example.com')
# Get text content
title = page.locator('h1').text_content()
print(title)
browser.close()
Async Usage
import asyncio
from playwright.async_api import async_playwright
async def scrape():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto('https://example.com')
content = await page.locator('h1').text_content()
await browser.close()
return content
result = asyncio.run(scrape())
Locators
# By CSS selector
page.locator('div.content')
# By text
page.get_by_text('Submit')
# By role
page.get_by_role('button', name='Submit')
# By test ID
page.get_by_test_id('login-button')
# By placeholder
page.get_by_placeholder('Enter email')
# Chain locators
page.locator('div.container').locator('button.submit')
Waiting
# Wait for element
page.wait_for_selector('div.loaded')
# Wait for network idle
page.wait_for_load_state('networkidle')
# Wait for specific condition
page.locator('button').wait_for(state='visible')
Interactions
# Click
page.locator('button').click()
# Fill input
page.locator('input[name="email"]').fill('test@example.com')
# Select dropdown
page.locator('select').select_option('value1')
# Upload file
page.locator('input[type="file"]').set_input_files('file.pdf')
Extract Data
# Get all matching elements
items = page.locator('div.item').all()
for item in items:
name = item.locator('h3').text_content()
price = item.locator('.price').text_content()
# Get inner HTML
html = page.locator('div.content').inner_html()
# Get attribute
href = page.locator('a').get_attribute('href')
When to Use Playwright
- JavaScript-rendered content
- Complex user interactions
- Multi-page workflows
- Screenshot/PDF generation
- Browser testing
When NOT to Use Playwright
- Static HTML pages (use requests + BeautifulSoup)
- API responses
- Already downloaded HTML files
- Simple HTTP requests