| id | skill-webapp-testing |
| name | WebApp Testing — Playwright Automation |
| description | Test local or preview web applications with Playwright, managed servers, and reconnaissance-first workflows for Cortex-OS projects. |
| version | 1.0.0 |
| author | brAInwav QA & Reliability Guild |
| owner | @jamiescottcraik |
| category | testing |
| difficulty | advanced |
| tags | testing, playwright, automation, webapp, quality |
| estimatedTokens | 4200 |
| license | Complete terms in LICENSE.txt |
| requiredTools | python, playwright, node |
| prerequisites | Playwright installed with browsers (`pnpm exec playwright install --with-deps`), Access to the target web application repository, Defined acceptance criteria for UI workflows |
| relatedSkills | skill-tdd-red-green-refactor, skill-testing-evidence-triplet |
| resources | ./resources/scripts/with_server.py, ./resources/examples/element_discovery.py, ./resources/examples/static_html_automation.py, ./resources/examples/console_logging.py, ./resources/LICENSE.txt |
| deprecated | false |
| replacedBy | null |
| impl | packages/testing-toolkit/src/webapp_playwright.ts#runWebAppChecks |
| inputs | [object Object] |
| outputs | [object Object] |
| preconditions | Application builds locally and starts without errors., Environment variables or secrets for the app are available via secure stores., Accessibility and performance budgets documented in the test plan. |
| sideEffects | Starts and stops local servers using helper scripts., Generates screenshots, console logs, and traces under the artifacts directory. |
| estimatedCost | $0.003 / test cycle (~600 tokens across recon, scripting, evidence capture). |
| calls | skill-tdd-red-green-refactor, skill-mcp-builder |
| requiresContext | memory://skills/skill-webapp-testing/historical-runs |
| providesContext | memory://skills/skill-webapp-testing/latest-report |
| monitoring | true |
| lifecycle | [object Object] |
| estimatedDuration | PT45M |
| i18n | [object Object] |
| persuasiveFraming | [object Object] |
| observability | [object Object] |
| governance | [object Object] |
| schemaStatus | [object Object] |
WebApp Testing — Playwright Automation
When to Use
- Verifying a Cortex-OS web surface (dashboard, MCP inspector, internal tools) before release.
- Reproducing or preventing regressions discovered in manual QA or user bug reports.
- Standing up smoke or regression suites for new features that rely on browser interaction.
- Collecting artefacts (screenshots, console logs) required by CI gates or auditors.
How to Apply
- Review the test plan and map required servers/ports; configure secrets via
op runif needed. - Use
with_server.py --helpto orchestrate backend/frontend processes and ensure health checks pass. - Follow the reconnaissance-first pattern: wait for
networkidle, capture DOM/screenshot, identify stable selectors. - Implement Playwright scripts (Python or TS) referencing helper examples; capture evidence and console logs.
- Run suites headless in CI and locally, archive artefacts, and log Local Memory outcomes with effectiveness scores.
Success Criteria
- Servers managed by helper scripts start/stop cleanly with health checks enforced.
- Playwright scripts use stable selectors, wait strategies, and produce zero unhandled rejections.
- Evidence bundle contains screenshots, console logs, and trace/report for each scenario.
- Regression discovered? Provide actionable remediation and rerun to green before closing the task.
- Local Memory entry stores results (
skillUsed: "skill-webapp-testing", effectiveness ≥0.8) with artefact pointers.
0) Mission Snapshot — What / Why / Where / How / Result
- What: Automate end-to-end web UI testing for Cortex-OS surfaces using Playwright and managed server helpers.
- Why: Ensures UI quality, prevents regressions, and supplies auditable evidence for release gates.
- Where: Applies to any web client or dashboard shipped within Cortex-OS (local-first, MCP-connected, or cloud preview).
- How: Combine reconnaissance-first Playwright scripting with helper scripts for server lifecycle and logging.
- Result: Repeatable test suites with artefacts meeting governance requirements and feeding into CI pass criteria.
1) Contract — Inputs → Outputs
Inputs include the test plan, server definitions, environment configuration, and acceptance criteria. Outputs are Playwright scripts, helper invocations, artefacts (screenshots/logs/traces), and summarised reports stored alongside Evidence Triplet artefacts.
2) Preconditions & Safeguards
- Confirm target app builds and passes lint/typecheck before UI testing.
- Reserve necessary ports and avoid collisions with running services.
- Document authentication flows; prefer test accounts with least privilege.
- Verify accessibility/performance budgets to guide additional checks (axe, Lighthouse) if needed.
3) Implementation Playbook (RED→GREEN→REFACTOR or analogous phases)
- Reconnaissance (RED): Run the app manually or via helper scripts; capture initial screenshots, note selectors, confirm dynamic loading behaviour.
- Script Authoring (GREEN): Write Playwright tests referencing helper examples; structure steps as arrange/act/assert with clear waits.
- Hardening & Evidence (REFACTOR): Execute suites headless and in CI, gather artefacts, stabilise flaky selectors, and update documentation with lessons learned.
4) Observability & Telemetry Hooks
- Enable Playwright tracing (
context.tracing.start/stop) for flaky investigations. - Log console output via
log_browser_console.pyand store under artefacts directory. - Feed key metrics (pass rate, duration) into observability tooling; configure alerts for repeated failures.
5) Safety, Compliance & Governance
- Run scripts headless to respect security policies; never expose secrets in logs.
- Ensure captured data excludes PII or redact before storing artefacts.
- Follow RULES_OF_AI logging format (
{"brand":"brAInwav"}); archive artefacts per retention policy. - Document manual overrides or skipped tests and open follow-up tasks when unavoidable.
6) Success Criteria & Acceptance Tests
- Playwright suite returns exit code 0; failing cases include actionable error messages.
- Automated run in CI attaches screenshots/logs to job artefacts and passes coverage gates.
- Accessibility spot-check performed (axe or manual) when UI is user-facing.
- Evidence Triplet recorded: failing run screenshot/log, passing rerun, mutation/property or coverage proof.
7) Failure Modes & Recovery
- Flaky selectors: Switch to role/text selectors or data-testid attributes; document required DOM changes.
- Server start failures: Validate commands with
with_server.py --help, ensure ports free, add retry/backoff. - Timeouts: Tune
expect_timeout, increase waits after recon; prefer deterministic signals (networkidle, state selectors). - Headless-only bugs: Reproduce with trace viewer, capture video, and work with product team for a fix.
8) Worked Examples & Snippets
resources/scripts/with_server.py— orchestrate multiple server processes with automatic teardown.resources/examples/element_discovery.py— demonstrates locator enumeration and selector strategies.resources/examples/static_html_automation.py— shows testing static content without servers.
9) Memory & Knowledge Integration
- Log each run in Local Memory with environment, pass rate, and key findings; tag with
webapp-testing. - Link to related skills (e.g., performance audits) using
relationship_type_enum: "depends_on". - Reference memory IDs in PR descriptions and task manifests for audit trails.
10) Lifecycle & Versioning Notes
- Update scripts when Playwright or Node versions change; record compatibility matrix.
- Mirror helper updates into shared tooling packages to avoid drift.
- Revisit selectors quarterly or when UI frameworks upgrade (React, routing, design systems).
11) References & Evidence
- Playwright documentation (
docs.playwright.dev) for API reference. - Helper scripts and examples bundled with this skill.
- Artefacts captured per run: screenshots, logs, Playwright HTML report, trace files.
12) Schema Gap Checklist
- Add automatic trace upload to observability storage via MCP.
- Integrate accessibility lint (axe) into helper scripts.
- Extend validation to ensure artefact directory includes a summary JSON for downstream ingestion.