name

e2e-testing

description

Comprehensive E2E testing skill using Playwright MCP for systematic web application testing. This skill should be used when users need to test web-based systems end-to-end, set up test regimes, run exploratory tests, or analyze test history. Triggers on requests like "test my webapp", "set up E2E tests", "run the tests", "what's been flaky", or when validating web application functionality. The skill observes and reports only - it never fixes issues. Supports three modes - setup (create test regime), run (execute tests), and report (analyze results).

E2E Testing Skill

Overview

A comprehensive E2E testing skill using Playwright MCP for systematic testing of any web-based system. The skill:

Observes and reports - Never fixes issues, only documents them
Discovers paths - Finds undocumented functionality at runtime
Tracks history - Identifies flaky areas and suggests variations
Produces dual reports - Human-readable and machine-readable formats

Prerequisites

Before using this skill, verify Playwright MCP is available:

Check for playwright in MCP server configuration
If missing, add to Claude settings:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

If Playwright MCP is unavailable, inform the user and provide setup instructions before proceeding.

Mode Selection

This skill operates in three modes. Determine mode from user request:

User Request	Mode
"Set up tests for...", "Create test regime"	Setup
"Run the tests", "Test the...", "Execute tests"	Run
"Show test results", "What failed?", "What's flaky?"	Report

If unclear, ask: "Would you like to set up a test regime, run existing tests, or view reports?"

Setup Mode

Purpose: Create or update test regime through interactive discovery.

Entry Points

Determine entry point from user context:

Context	Entry
User provides URL	URL Exploration
User describes system purpose	Description-Based
User points to documentation	Documentation Extraction
Combination of above	Combined Flow (recommended)

Setup Workflow

Step 1: Gather Initial Context

Ask for any missing information:

URL: Base URL of the application
Purpose: What does this system do? (1-2 sentences)
Key workflows: What are the critical user journeys?
Existing docs: Any README, user stories, or specs?

Step 2: Explore Application

Use Playwright MCP to explore:

Navigate to base URL
Capture accessibility snapshot
Identify:
  - Navigation elements (menus, links)
  - Interactive elements (buttons, forms)
  - Key pages and sections

For each discovered element, note:

Element type and purpose
Alternative paths to reach it
Required preconditions (login, etc.)

Step 3: Discover Alternative Paths

While exploring, actively look for:

Multiple ways to accomplish the same goal
Hidden or non-obvious functionality
Edge cases in navigation

Document discoveries as: "Found alternative: [description]"

Step 4: Define Test Scenarios

For each key workflow, create scenario with:

scenario: [Descriptive name]
description: [What this tests]
preconditions:
  - [Required state before test]
blocking: [true/false - does failure prevent other tests?]
steps:
  - action: [navigate/click/type/verify/wait]
    target: [selector or description]
    value: [input value if applicable]
    flexibility:
      type: [exact/contains/ai_judgment]
      criteria: [specific rules or judgment prompt]
success_criteria:
  - [What must be true for pass]
alternatives:
  - [Alternative path if primary fails]

Step 5: Create Test Regime File

Write regime to tests/e2e/test_regime.yml:

# Test Regime: [Application Name]
# Created: [YYYY-MM-DD]
# Last Updated: [YYYY-MM-DD]

metadata:
  application: [Name]
  base_url: [URL]
  description: [Purpose]

global_settings:
  screenshot_every_step: true
  capture_network: true
  capture_console: true
  discovery_cap: 5  # Max new paths to discover per run

blocking_dependencies:
  - scenario: login
    blocks: [profile, settings, checkout]  # These won't run if login fails

scenarios:
  - scenario: [name]
    # ... scenario definition

Step 6: Present Regime for Review

Show user:

Summary of discovered scenarios
Blocking dependencies identified
Alternative paths found
Ask for confirmation or modifications

Run Mode

Purpose: Execute tests sequentially with full evidence capture.

CRITICAL: Test Status Integrity

Principle: No Invalid Skips

A test should only have three outcomes:

Status	Meaning
PASSED	The feature works as specified
FAILED	The feature doesn't work or doesn't exist
SKIPPED	Only for legitimate environmental reasons (see below)

Valid Reasons to Skip a Test

Test environment unavailable (database down, service unreachable)
Explicit @skip decorator for documented WIP features with ticket reference
Platform-specific tests running on wrong platform
External dependency unavailable (third-party API down)

Invalid Reasons to Skip (Mark as FAILED Instead)

Situation	Correct Status	Notes Format
Feature doesn't exist in UI	FAILED	"Expected [feature] not found. Feature not implemented."
Test wasn't executed/completed	FAILED	"Test not executed. [What wasn't verified]."
Test would fail	FAILED	That's the point of testing
"Didn't get around to it"	FAILED	Incomplete test coverage is a failure
Feature works differently than spec	FAILED	"Implementation doesn't match specification: [details]"

Rationale

The purpose of a test is to fail when something doesn't work. Marking missing features or unexecuted tests as "skipped" produces artificially inflated pass rates and hides real issues. A test report that only shows green isn't useful if it achieved that by ignoring problems.

Implementation Examples

When a test cannot find the expected UI element or feature:

Status: FAILED
Notes: "FAILED: Expected 'Add to Cart' button not found. Feature not implemented or selector changed."

When a test is not fully executed:

Status: FAILED
Notes: "FAILED: Test not executed. Checkout flow verification was not completed - stopped at payment step."

When environment is genuinely unavailable (valid skip):

Status: SKIPPED
Notes: "SKIPPED: Payment gateway sandbox unavailable. Ticket: PAY-123"

Pre-Run Checks

Verify regime exists: Check for tests/e2e/test_regime.yml
- If missing: "No test regime found. Would you like to run Setup mode first?"
Load history: Check for tests/e2e/test_history.json
- If exists: Note previously flaky scenarios for extra attention
Verify Playwright MCP: Confirm browser automation is available

Execution Protocol

Rule 1: Always Start from Beginning

Every test run starts fresh from Step 1 of Scenario 1. Never skip steps or use cached state.

Rule 2: Sequential Execution

Execute scenarios in order. For each scenario:

1. Check preconditions
2. Execute each step:
   a. Perform action via Playwright MCP
   b. Capture screenshot
   c. Capture DOM state
   d. Capture network activity
   e. Capture console logs
   f. Evaluate success using flexibility criteria
3. Record result (pass/fail/blocked/skipped)
   - PASS: Step completed successfully
   - FAIL: Step failed OR element not found OR feature missing
   - BLOCKED: Dependent on a failed blocking scenario
   - SKIPPED: Only for valid environmental reasons (see Test Status Integrity)
4. If failed: Try alternatives if defined
5. If blocking failure: Stop dependent scenarios

Rule 3: Failure Handling

When a step fails:

Mark as failed - Record failure with evidence
Try alternatives - If alternatives defined, attempt them
Assess blocking impact:
- Check if this scenario blocks others
- If blocking: Mark dependent scenarios as "blocked"
- If non-blocking: Continue to next scenario
Never fix - Document the issue, do not attempt repairs

Rule 4: Runtime Discovery

While executing, watch for undocumented paths:

New navigation options not in regime
Alternative ways to complete actions
Unexpected UI states

For discoveries:

Queue for testing (up to discovery_cap limit)
Execute after all defined scenarios complete
Document findings in report

Flexibility Criteria Evaluation

For each success check, apply the configured flexibility type:

Type	Evaluation Method
`exact`	String/value must match exactly
`contains`	Target must contain specified text
`ai_judgment`	Use AI reasoning: "Does this accomplish [goal]?"

For ai_judgment, provide confidence level:

High: Clear success/failure
Medium: Likely success/failure but some ambiguity
Low: Uncertain, recommend manual review

Evidence Bundle

For each step, capture and store:

evidence/
  scenario-name/
    step-01/
      screenshot.png
      dom-snapshot.html
      network-log.json
      console-log.txt
      accessibility-snapshot.yaml

History Integration

After run completes:

Compare to previous runs:
- Same scenario passed before but failed now? Flag regression
- Same scenario failed before? Note persistent issue
- Intermittent pass/fail? Mark as flaky
Update history file:

{
  "runs": [
    {
      "timestamp": "ISO-8601",
      "scenarios": {
        "scenario-name": {
          "result": "pass|fail|blocked|skipped",
          "result_notes": "Details about the result",
          "duration_ms": 1234,
          "steps_completed": 5,
          "confidence": "high|medium|low",
          "discoveries": []
        }
      }
    }
  ],
  "flaky_scenarios": ["scenario-1", "scenario-2"],
  "suggested_variations": [
    {
      "scenario": "login",
      "variation": "Test with special characters in password",
      "reason": "Failed 3/10 runs with complex passwords"
    }
  ]
}

Result status rules (see Test Status Integrity):

pass: Feature works as specified
fail: Feature doesn't work, doesn't exist, or test incomplete
blocked: Depends on failed blocking scenario
skipped: ONLY for valid environmental reasons (with ticket reference)

Generate variations for flaky areas:
- If scenario failed 3+ times in last 10 runs: Auto-suggest new test variations
- Add to suggested_variations in history

Report Mode

Purpose: Generate actionable reports from test results.

Report Types

Generate both reports after every run:

Human-Readable Report

Output to tests/e2e/reports/YYYY-MM-DD-HHmmss-report.md:

# E2E Test Report: [Application Name]

**Run Date**: YYYY-MM-DD HH:mm:ss
**Duration**: X minutes
**Result**: X passed, Y failed, Z blocked, W skipped

## Summary

| Scenario | Result | Duration | Confidence |
|----------|--------|----------|------------|
| Login | PASS | 2.3s | High |
| Checkout | FAIL | 5.1s | High |

## Failures

### Checkout Flow

**Step Failed**: Step 3 - Click "Complete Purchase"
**Error**: Button not found within timeout

**Evidence**:
- Screenshot: `evidence/checkout/step-03/screenshot.png`
- Expected: Button with text "Complete Purchase"
- Actual: Page showed error message "Session expired"

**Reproduction Steps**:
1. Navigate to https://app.example.com
2. Login with test credentials
3. Add item to cart
4. Click checkout
5. [FAILS HERE] Click "Complete Purchase"

**Suggested Investigation**:
- Session timeout may be too aggressive
- Check if login state persists through checkout flow

## Discoveries

Found 2 undocumented paths:
1. **Alternative checkout**: Guest checkout available via footer link
2. **Quick reorder**: "Buy again" button on order history

## Flaky Areas

Based on history (last 10 runs):
- `search-results`: 7/10 pass rate - timing issue suspected
- `image-upload`: 8/10 pass rate - file size variations

## Suggested New Tests

Based on failures and history:
1. Test session persistence during long checkout
2. Test guest checkout flow (discovered)
3. Add timeout resilience to search tests

Machine-Readable Report

Output to tests/e2e/reports/YYYY-MM-DD-HHmmss-report.json:

{
  "metadata": {
    "application": "App Name",
    "base_url": "https://...",
    "run_timestamp": "ISO-8601",
    "duration_ms": 123456,
    "regime_version": "hash-of-regime-file"
  },
  "summary": {
    "total": 10,
    "passed": 7,
    "failed": 2,
    "blocked": 1,
    "skipped": 0
  },
  "scenarios": [
    {
      "name": "checkout",
      "result": "fail",
      "duration_ms": 5100,
      "confidence": "high",
      "failed_step": {
        "index": 3,
        "action": "click",
        "target": "button:Complete Purchase",
        "error": "Element not found",
        "evidence_path": "evidence/checkout/step-03/"
      },
      "reproduction": {
        "playwright_commands": [
          "await page.goto('https://app.example.com')",
          "await page.fill('#username', 'test')",
          "await page.click('button:Login')",
          "await page.click('.add-to-cart')",
          "await page.click('button:Checkout')",
          "// FAILED: await page.click('button:Complete Purchase')"
        ]
      },
      "alternatives_tried": [
        {
          "path": "Use keyboard Enter instead of click",
          "result": "fail"
        }
      ]
    }
  ],
  "discoveries": [
    {
      "type": "alternative_path",
      "description": "Guest checkout via footer",
      "location": "footer > a.guest-checkout",
      "tested": true,
      "result": "pass"
    }
  ],
  "history_analysis": {
    "regressions": ["checkout"],
    "persistent_failures": [],
    "flaky": ["search-results", "image-upload"]
  },
  "suggested_actions": [
    {
      "type": "investigate",
      "scenario": "checkout",
      "reason": "New regression - passed in previous 5 runs"
    },
    {
      "type": "add_test",
      "scenario": "guest-checkout",
      "reason": "Discovered undocumented path"
    }
  ]
}

Report Presentation

After generating reports:

Display summary to user:
- Overall pass/fail counts
- Critical failures (blocking scenarios)
- Regressions (newly failing)
Highlight actionable items:
- What needs investigation
- Discovered paths to add to regime
- Suggested test variations
Offer next steps:
- "Would you like to add the discovered paths to the test regime?"
- "Should I update the regime with suggested variations?"
- "Ready to share the machine report with the bug-fix skill?"

Quality Checklist

Before completing any mode, verify:

Setup Mode

All entry points explored (URL, description, docs)
Alternative paths documented
Blocking dependencies identified
Flexibility criteria defined for dynamic content
Test regime file created and valid YAML

Run Mode

Started from beginning (no skipped steps)
Every step has evidence captured
Failures have alternatives attempted
Blocking impacts assessed
Discoveries queued and tested
History updated

Report Mode

Both human and machine reports generated
Reproduction steps included for failures
Evidence paths valid and accessible
History analysis included
Actionable suggestions provided

Resources

references/

test-regime-schema.md - Complete YAML schema for test regime files
flexibility-criteria-guide.md - How to define and evaluate flexible success criteria
history-schema.md - JSON schema for test history tracking

Report Templates

Report templates are embedded in this skill. The machine-readable format is designed for consumption by a future bug-fix skill.

Install Skill

SKILL.md

E2E Testing Skill

Overview

Prerequisites

Mode Selection

Setup Mode

Entry Points

Setup Workflow

Step 1: Gather Initial Context

Step 2: Explore Application

Step 3: Discover Alternative Paths

Step 4: Define Test Scenarios

Step 5: Create Test Regime File

Step 6: Present Regime for Review

Run Mode

CRITICAL: Test Status Integrity

Valid Reasons to Skip a Test

Invalid Reasons to Skip (Mark as FAILED Instead)

Rationale

Implementation Examples

Pre-Run Checks

Execution Protocol

Rule 1: Always Start from Beginning

Rule 2: Sequential Execution

Rule 3: Failure Handling

Rule 4: Runtime Discovery

Flexibility Criteria Evaluation

Evidence Bundle

History Integration

Report Mode

Report Types

Human-Readable Report

Machine-Readable Report

Report Presentation

Quality Checklist

Setup Mode

Run Mode

Report Mode

Resources

references/

Report Templates