| name | e2e-testing |
| description | Comprehensive E2E testing skill using Playwright MCP for systematic web application testing. This skill should be used when users need to test web-based systems end-to-end, set up test regimes, run exploratory tests, or analyze test history. Triggers on requests like "test my webapp", "set up E2E tests", "run the tests", "what's been flaky", or when validating web application functionality. The skill observes and reports only - it never fixes issues. Supports three modes - setup (create test regime), run (execute tests), and report (analyze results). |
E2E Testing Skill
Overview
A comprehensive E2E testing skill using Playwright MCP for systematic testing of any web-based system. The skill:
- Observes and reports - Never fixes issues, only documents them
- Discovers paths - Finds undocumented functionality at runtime
- Tracks history - Identifies flaky areas and suggests variations
- Produces dual reports - Human-readable and machine-readable formats
Prerequisites
Before using this skill, verify Playwright MCP is available:
- Check for
playwrightin MCP server configuration - If missing, add to Claude settings:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp@latest"]
}
}
}
If Playwright MCP is unavailable, inform the user and provide setup instructions before proceeding.
Mode Selection
This skill operates in three modes. Determine mode from user request:
| User Request | Mode |
|---|---|
| "Set up tests for...", "Create test regime" | Setup |
| "Run the tests", "Test the...", "Execute tests" | Run |
| "Show test results", "What failed?", "What's flaky?" | Report |
If unclear, ask: "Would you like to set up a test regime, run existing tests, or view reports?"
Setup Mode
Purpose: Create or update test regime through interactive discovery.
Entry Points
Determine entry point from user context:
| Context | Entry |
|---|---|
| User provides URL | URL Exploration |
| User describes system purpose | Description-Based |
| User points to documentation | Documentation Extraction |
| Combination of above | Combined Flow (recommended) |
Setup Workflow
Step 1: Gather Initial Context
Ask for any missing information:
- URL: Base URL of the application
- Purpose: What does this system do? (1-2 sentences)
- Key workflows: What are the critical user journeys?
- Existing docs: Any README, user stories, or specs?
Step 2: Explore Application
Use Playwright MCP to explore:
Navigate to base URL
Capture accessibility snapshot
Identify:
- Navigation elements (menus, links)
- Interactive elements (buttons, forms)
- Key pages and sections
For each discovered element, note:
- Element type and purpose
- Alternative paths to reach it
- Required preconditions (login, etc.)
Step 3: Discover Alternative Paths
While exploring, actively look for:
- Multiple ways to accomplish the same goal
- Hidden or non-obvious functionality
- Edge cases in navigation
Document discoveries as: "Found alternative: [description]"
Step 4: Define Test Scenarios
For each key workflow, create scenario with:
scenario: [Descriptive name]
description: [What this tests]
preconditions:
- [Required state before test]
blocking: [true/false - does failure prevent other tests?]
steps:
- action: [navigate/click/type/verify/wait]
target: [selector or description]
value: [input value if applicable]
flexibility:
type: [exact/contains/ai_judgment]
criteria: [specific rules or judgment prompt]
success_criteria:
- [What must be true for pass]
alternatives:
- [Alternative path if primary fails]
Step 5: Create Test Regime File
Write regime to tests/e2e/test_regime.yml:
# Test Regime: [Application Name]
# Created: [YYYY-MM-DD]
# Last Updated: [YYYY-MM-DD]
metadata:
application: [Name]
base_url: [URL]
description: [Purpose]
global_settings:
screenshot_every_step: true
capture_network: true
capture_console: true
discovery_cap: 5 # Max new paths to discover per run
blocking_dependencies:
- scenario: login
blocks: [profile, settings, checkout] # These won't run if login fails
scenarios:
- scenario: [name]
# ... scenario definition
Step 6: Present Regime for Review
Show user:
- Summary of discovered scenarios
- Blocking dependencies identified
- Alternative paths found
- Ask for confirmation or modifications
Run Mode
Purpose: Execute tests sequentially with full evidence capture.
CRITICAL: Test Status Integrity
Principle: No Invalid Skips
A test should only have three outcomes:
| Status | Meaning |
|---|---|
| PASSED | The feature works as specified |
| FAILED | The feature doesn't work or doesn't exist |
| SKIPPED | Only for legitimate environmental reasons (see below) |
Valid Reasons to Skip a Test
- Test environment unavailable (database down, service unreachable)
- Explicit
@skipdecorator for documented WIP features with ticket reference - Platform-specific tests running on wrong platform
- External dependency unavailable (third-party API down)
Invalid Reasons to Skip (Mark as FAILED Instead)
| Situation | Correct Status | Notes Format |
|---|---|---|
| Feature doesn't exist in UI | FAILED | "Expected [feature] not found. Feature not implemented." |
| Test wasn't executed/completed | FAILED | "Test not executed. [What wasn't verified]." |
| Test would fail | FAILED | That's the point of testing |
| "Didn't get around to it" | FAILED | Incomplete test coverage is a failure |
| Feature works differently than spec | FAILED | "Implementation doesn't match specification: [details]" |
Rationale
The purpose of a test is to fail when something doesn't work. Marking missing features or unexecuted tests as "skipped" produces artificially inflated pass rates and hides real issues. A test report that only shows green isn't useful if it achieved that by ignoring problems.
Implementation Examples
When a test cannot find the expected UI element or feature:
Status: FAILED
Notes: "FAILED: Expected 'Add to Cart' button not found. Feature not implemented or selector changed."
When a test is not fully executed:
Status: FAILED
Notes: "FAILED: Test not executed. Checkout flow verification was not completed - stopped at payment step."
When environment is genuinely unavailable (valid skip):
Status: SKIPPED
Notes: "SKIPPED: Payment gateway sandbox unavailable. Ticket: PAY-123"
Pre-Run Checks
Verify regime exists: Check for
tests/e2e/test_regime.yml- If missing: "No test regime found. Would you like to run Setup mode first?"
Load history: Check for
tests/e2e/test_history.json- If exists: Note previously flaky scenarios for extra attention
Verify Playwright MCP: Confirm browser automation is available
Execution Protocol
Rule 1: Always Start from Beginning
Every test run starts fresh from Step 1 of Scenario 1. Never skip steps or use cached state.
Rule 2: Sequential Execution
Execute scenarios in order. For each scenario:
1. Check preconditions
2. Execute each step:
a. Perform action via Playwright MCP
b. Capture screenshot
c. Capture DOM state
d. Capture network activity
e. Capture console logs
f. Evaluate success using flexibility criteria
3. Record result (pass/fail/blocked/skipped)
- PASS: Step completed successfully
- FAIL: Step failed OR element not found OR feature missing
- BLOCKED: Dependent on a failed blocking scenario
- SKIPPED: Only for valid environmental reasons (see Test Status Integrity)
4. If failed: Try alternatives if defined
5. If blocking failure: Stop dependent scenarios
Rule 3: Failure Handling
When a step fails:
- Mark as failed - Record failure with evidence
- Try alternatives - If alternatives defined, attempt them
- Assess blocking impact:
- Check if this scenario blocks others
- If blocking: Mark dependent scenarios as "blocked"
- If non-blocking: Continue to next scenario
- Never fix - Document the issue, do not attempt repairs
Rule 4: Runtime Discovery
While executing, watch for undocumented paths:
- New navigation options not in regime
- Alternative ways to complete actions
- Unexpected UI states
For discoveries:
- Queue for testing (up to
discovery_caplimit) - Execute after all defined scenarios complete
- Document findings in report
Flexibility Criteria Evaluation
For each success check, apply the configured flexibility type:
| Type | Evaluation Method |
|---|---|
exact |
String/value must match exactly |
contains |
Target must contain specified text |
ai_judgment |
Use AI reasoning: "Does this accomplish [goal]?" |
For ai_judgment, provide confidence level:
- High: Clear success/failure
- Medium: Likely success/failure but some ambiguity
- Low: Uncertain, recommend manual review
Evidence Bundle
For each step, capture and store:
evidence/
scenario-name/
step-01/
screenshot.png
dom-snapshot.html
network-log.json
console-log.txt
accessibility-snapshot.yaml
History Integration
After run completes:
Compare to previous runs:
- Same scenario passed before but failed now? Flag regression
- Same scenario failed before? Note persistent issue
- Intermittent pass/fail? Mark as flaky
Update history file:
{
"runs": [
{
"timestamp": "ISO-8601",
"scenarios": {
"scenario-name": {
"result": "pass|fail|blocked|skipped",
"result_notes": "Details about the result",
"duration_ms": 1234,
"steps_completed": 5,
"confidence": "high|medium|low",
"discoveries": []
}
}
}
],
"flaky_scenarios": ["scenario-1", "scenario-2"],
"suggested_variations": [
{
"scenario": "login",
"variation": "Test with special characters in password",
"reason": "Failed 3/10 runs with complex passwords"
}
]
}
Result status rules (see Test Status Integrity):
pass: Feature works as specifiedfail: Feature doesn't work, doesn't exist, or test incompleteblocked: Depends on failed blocking scenarioskipped: ONLY for valid environmental reasons (with ticket reference)
- Generate variations for flaky areas:
- If scenario failed 3+ times in last 10 runs: Auto-suggest new test variations
- Add to
suggested_variationsin history
Report Mode
Purpose: Generate actionable reports from test results.
Report Types
Generate both reports after every run:
Human-Readable Report
Output to tests/e2e/reports/YYYY-MM-DD-HHmmss-report.md:
# E2E Test Report: [Application Name]
**Run Date**: YYYY-MM-DD HH:mm:ss
**Duration**: X minutes
**Result**: X passed, Y failed, Z blocked, W skipped
## Summary
| Scenario | Result | Duration | Confidence |
|----------|--------|----------|------------|
| Login | PASS | 2.3s | High |
| Checkout | FAIL | 5.1s | High |
## Failures
### Checkout Flow
**Step Failed**: Step 3 - Click "Complete Purchase"
**Error**: Button not found within timeout
**Evidence**:
- Screenshot: `evidence/checkout/step-03/screenshot.png`
- Expected: Button with text "Complete Purchase"
- Actual: Page showed error message "Session expired"
**Reproduction Steps**:
1. Navigate to https://app.example.com
2. Login with test credentials
3. Add item to cart
4. Click checkout
5. [FAILS HERE] Click "Complete Purchase"
**Suggested Investigation**:
- Session timeout may be too aggressive
- Check if login state persists through checkout flow
## Discoveries
Found 2 undocumented paths:
1. **Alternative checkout**: Guest checkout available via footer link
2. **Quick reorder**: "Buy again" button on order history
## Flaky Areas
Based on history (last 10 runs):
- `search-results`: 7/10 pass rate - timing issue suspected
- `image-upload`: 8/10 pass rate - file size variations
## Suggested New Tests
Based on failures and history:
1. Test session persistence during long checkout
2. Test guest checkout flow (discovered)
3. Add timeout resilience to search tests
Machine-Readable Report
Output to tests/e2e/reports/YYYY-MM-DD-HHmmss-report.json:
{
"metadata": {
"application": "App Name",
"base_url": "https://...",
"run_timestamp": "ISO-8601",
"duration_ms": 123456,
"regime_version": "hash-of-regime-file"
},
"summary": {
"total": 10,
"passed": 7,
"failed": 2,
"blocked": 1,
"skipped": 0
},
"scenarios": [
{
"name": "checkout",
"result": "fail",
"duration_ms": 5100,
"confidence": "high",
"failed_step": {
"index": 3,
"action": "click",
"target": "button:Complete Purchase",
"error": "Element not found",
"evidence_path": "evidence/checkout/step-03/"
},
"reproduction": {
"playwright_commands": [
"await page.goto('https://app.example.com')",
"await page.fill('#username', 'test')",
"await page.click('button:Login')",
"await page.click('.add-to-cart')",
"await page.click('button:Checkout')",
"// FAILED: await page.click('button:Complete Purchase')"
]
},
"alternatives_tried": [
{
"path": "Use keyboard Enter instead of click",
"result": "fail"
}
]
}
],
"discoveries": [
{
"type": "alternative_path",
"description": "Guest checkout via footer",
"location": "footer > a.guest-checkout",
"tested": true,
"result": "pass"
}
],
"history_analysis": {
"regressions": ["checkout"],
"persistent_failures": [],
"flaky": ["search-results", "image-upload"]
},
"suggested_actions": [
{
"type": "investigate",
"scenario": "checkout",
"reason": "New regression - passed in previous 5 runs"
},
{
"type": "add_test",
"scenario": "guest-checkout",
"reason": "Discovered undocumented path"
}
]
}
Report Presentation
After generating reports:
Display summary to user:
- Overall pass/fail counts
- Critical failures (blocking scenarios)
- Regressions (newly failing)
Highlight actionable items:
- What needs investigation
- Discovered paths to add to regime
- Suggested test variations
Offer next steps:
- "Would you like to add the discovered paths to the test regime?"
- "Should I update the regime with suggested variations?"
- "Ready to share the machine report with the bug-fix skill?"
Quality Checklist
Before completing any mode, verify:
Setup Mode
- All entry points explored (URL, description, docs)
- Alternative paths documented
- Blocking dependencies identified
- Flexibility criteria defined for dynamic content
- Test regime file created and valid YAML
Run Mode
- Started from beginning (no skipped steps)
- Every step has evidence captured
- Failures have alternatives attempted
- Blocking impacts assessed
- Discoveries queued and tested
- History updated
Report Mode
- Both human and machine reports generated
- Reproduction steps included for failures
- Evidence paths valid and accessible
- History analysis included
- Actionable suggestions provided
Resources
references/
test-regime-schema.md- Complete YAML schema for test regime filesflexibility-criteria-guide.md- How to define and evaluate flexible success criteriahistory-schema.md- JSON schema for test history tracking
Report Templates
Report templates are embedded in this skill. The machine-readable format is designed for consumption by a future bug-fix skill.