id	skill-webapp-testing
name	WebApp Testing — Playwright Automation
description	Test local or preview web applications with Playwright, managed servers, and reconnaissance-first workflows for Cortex-OS projects.
version	1.0.0
author	brAInwav QA & Reliability Guild
owner	@jamiescottcraik
category	testing
difficulty	advanced
tags	testing, playwright, automation, webapp, quality
estimatedTokens	4200
license	Complete terms in LICENSE.txt
requiredTools	python, playwright, node
prerequisites	Playwright installed with browsers (`pnpm exec playwright install --with-deps`), Access to the target web application repository, Defined acceptance criteria for UI workflows
relatedSkills	skill-tdd-red-green-refactor, skill-testing-evidence-triplet
resources	./resources/scripts/with_server.py, ./resources/examples/element_discovery.py, ./resources/examples/static_html_automation.py, ./resources/examples/console_logging.py, ./resources/LICENSE.txt
deprecated	false
replacedBy	null
impl	packages/testing-toolkit/src/webapp_playwright.ts#runWebAppChecks
inputs	[object Object]
outputs	[object Object]
preconditions	Application builds locally and starts without errors., Environment variables or secrets for the app are available via secure stores., Accessibility and performance budgets documented in the test plan.
sideEffects	Starts and stops local servers using helper scripts., Generates screenshots, console logs, and traces under the artifacts directory.
estimatedCost	$0.003 / test cycle (~600 tokens across recon, scripting, evidence capture).
calls	skill-tdd-red-green-refactor, skill-mcp-builder
requiresContext	memory://skills/skill-webapp-testing/historical-runs
providesContext	memory://skills/skill-webapp-testing/latest-report
monitoring	true
lifecycle	[object Object]
estimatedDuration	PT45M
i18n	[object Object]
persuasiveFraming	[object Object]
observability	[object Object]
governance	[object Object]
schemaStatus	[object Object]

WebApp Testing — Playwright Automation

When to Use

Verifying a Cortex-OS web surface (dashboard, MCP inspector, internal tools) before release.
Reproducing or preventing regressions discovered in manual QA or user bug reports.
Standing up smoke or regression suites for new features that rely on browser interaction.
Collecting artefacts (screenshots, console logs) required by CI gates or auditors.

How to Apply

Review the test plan and map required servers/ports; configure secrets via op run if needed.
Use with_server.py --help to orchestrate backend/frontend processes and ensure health checks pass.
Follow the reconnaissance-first pattern: wait for networkidle, capture DOM/screenshot, identify stable selectors.
Implement Playwright scripts (Python or TS) referencing helper examples; capture evidence and console logs.
Run suites headless in CI and locally, archive artefacts, and log Local Memory outcomes with effectiveness scores.

Success Criteria

Servers managed by helper scripts start/stop cleanly with health checks enforced.
Playwright scripts use stable selectors, wait strategies, and produce zero unhandled rejections.
Evidence bundle contains screenshots, console logs, and trace/report for each scenario.
Regression discovered? Provide actionable remediation and rerun to green before closing the task.
Local Memory entry stores results (skillUsed: "skill-webapp-testing", effectiveness ≥0.8) with artefact pointers.

0) Mission Snapshot — What / Why / Where / How / Result

What: Automate end-to-end web UI testing for Cortex-OS surfaces using Playwright and managed server helpers.
Why: Ensures UI quality, prevents regressions, and supplies auditable evidence for release gates.
Where: Applies to any web client or dashboard shipped within Cortex-OS (local-first, MCP-connected, or cloud preview).
How: Combine reconnaissance-first Playwright scripting with helper scripts for server lifecycle and logging.
Result: Repeatable test suites with artefacts meeting governance requirements and feeding into CI pass criteria.

1) Contract — Inputs → Outputs

Inputs include the test plan, server definitions, environment configuration, and acceptance criteria. Outputs are Playwright scripts, helper invocations, artefacts (screenshots/logs/traces), and summarised reports stored alongside Evidence Triplet artefacts.

2) Preconditions & Safeguards

Confirm target app builds and passes lint/typecheck before UI testing.
Reserve necessary ports and avoid collisions with running services.
Document authentication flows; prefer test accounts with least privilege.
Verify accessibility/performance budgets to guide additional checks (axe, Lighthouse) if needed.

3) Implementation Playbook (RED→GREEN→REFACTOR or analogous phases)

Reconnaissance (RED): Run the app manually or via helper scripts; capture initial screenshots, note selectors, confirm dynamic loading behaviour.
Script Authoring (GREEN): Write Playwright tests referencing helper examples; structure steps as arrange/act/assert with clear waits.
Hardening & Evidence (REFACTOR): Execute suites headless and in CI, gather artefacts, stabilise flaky selectors, and update documentation with lessons learned.

4) Observability & Telemetry Hooks

Enable Playwright tracing (context.tracing.start/stop) for flaky investigations.
Log console output via log_browser_console.py and store under artefacts directory.
Feed key metrics (pass rate, duration) into observability tooling; configure alerts for repeated failures.

5) Safety, Compliance & Governance

Run scripts headless to respect security policies; never expose secrets in logs.
Ensure captured data excludes PII or redact before storing artefacts.
Follow RULES_OF_AI logging format ({"brand":"brAInwav"}); archive artefacts per retention policy.
Document manual overrides or skipped tests and open follow-up tasks when unavoidable.

6) Success Criteria & Acceptance Tests

Playwright suite returns exit code 0; failing cases include actionable error messages.
Automated run in CI attaches screenshots/logs to job artefacts and passes coverage gates.
Accessibility spot-check performed (axe or manual) when UI is user-facing.
Evidence Triplet recorded: failing run screenshot/log, passing rerun, mutation/property or coverage proof.

7) Failure Modes & Recovery

Flaky selectors: Switch to role/text selectors or data-testid attributes; document required DOM changes.
Server start failures: Validate commands with with_server.py --help, ensure ports free, add retry/backoff.
Timeouts: Tune expect_timeout, increase waits after recon; prefer deterministic signals (networkidle, state selectors).
Headless-only bugs: Reproduce with trace viewer, capture video, and work with product team for a fix.

8) Worked Examples & Snippets

resources/scripts/with_server.py — orchestrate multiple server processes with automatic teardown.
resources/examples/element_discovery.py — demonstrates locator enumeration and selector strategies.
resources/examples/static_html_automation.py — shows testing static content without servers.

9) Memory & Knowledge Integration

Log each run in Local Memory with environment, pass rate, and key findings; tag with webapp-testing.
Link to related skills (e.g., performance audits) using relationship_type_enum: "depends_on".
Reference memory IDs in PR descriptions and task manifests for audit trails.

10) Lifecycle & Versioning Notes

Update scripts when Playwright or Node versions change; record compatibility matrix.
Mirror helper updates into shared tooling packages to avoid drift.
Revisit selectors quarterly or when UI frameworks upgrade (React, routing, design systems).

11) References & Evidence

Playwright documentation (docs.playwright.dev) for API reference.
Helper scripts and examples bundled with this skill.
Artefacts captured per run: screenshots, logs, Playwright HTML report, trace files.

12) Schema Gap Checklist

Add automatic trace upload to observability storage via MCP.
Integrate accessibility lint (axe) into helper scripts.
Extend validation to ensure artefact directory includes a summary JSON for downstream ingestion.

WebApp Testing â Playwright Automation

Install Skill

SKILL.md