| name | harshjudge |
| description | AI-native E2E testing orchestration for Claude Code. Use when creating, running, or managing end-to-end test scenarios with visual evidence capture. Activates for tasks involving E2E tests, browser automation testing, test scenario creation, test execution with screenshots, or checking test status. |
HarshJudge E2E Testing
AI-native E2E testing with MCP tools and visual evidence capture.
Core Principles
- Evidence First: Screenshot before and after every action
- Fail Fast: Stop on error, report with context
- Complete Runs: Always call
completeRun, even on failure - Step Isolation: Each step executes in its own spawned agent for token efficiency
- Knowledge Accumulation: Learnings go to
prd.md, not scenarios
Step-Based Execution
HarshJudge uses a step-based agent pattern for token-efficient test execution:
Main Agent Step Agents (spawned per step)
│
├─ startRun(scenarioSlug)
│ ↓
│ Returns: runId, steps[]
│
├─► Spawn Agent: Step 01 ──────────────────────► Execute actions
│ │ │
│ │ ◄─────────────────────────────────── Return: { status, evidencePaths }
│ │
│ completeStep(runId, "01", status)
│ │
├─► Spawn Agent: Step 02 ──────────────────────► Execute actions
│ │ │
│ │ ◄─────────────────────────────────── Return: { status, evidencePaths }
│ │
│ completeStep(runId, "02", status)
│ │
│ ... (repeat for each step)
│
└─ completeRun(runId, finalStatus)
Benefits:
- Each step agent has isolated context (no token accumulation)
- Large outputs (screenshots, logs) saved to files, not returned
- Main agent only receives concise summaries
- Automatic token optimization without manual management
Workflows
| Intent | Reference | Key Tools |
|---|---|---|
| Initialize project | references/setup.md | initProject |
| Create scenario | references/create.md | createScenario |
| Run scenario | references/run.md | startRun, completeStep, completeRun |
| Fix failed test | references/iterate.md | getStatus, createScenario |
| Check status | references/status.md | getStatus |
Project Structure
.harshJudge/
config.yaml # Project configuration
prd.md # Product requirements (from assets/prd.md template)
scenarios/{slug}/
meta.yaml # Scenario definition + run statistics
steps/ # Individual step files
01-step-slug.md # Step 01 details
02-step-slug.md # Step 02 details
...
runs/{runId}/ # Run history
result.json # Run result with per-step data
step-01/evidence/ # Step 01 evidence
step-02/evidence/ # Step 02 evidence
...
snapshots/ # Inspection tool outputs (token-saving pattern)
Quick Reference
HarshJudge MCP Tools
| Tool | Purpose |
|---|---|
initProject |
Initialize project (spawns dashboard) |
createScenario |
Create/update scenario with step files |
toggleStar |
Toggle/set scenario starred status |
startRun |
Start test run, returns step list |
recordEvidence |
Capture evidence for a step |
completeStep |
Complete a step, get next step ID |
completeRun |
Finalize run with status |
getStatus |
Check project or scenario status |
openDashboard / closeDashboard |
Manage dashboard server |
Playwright MCP Tools
| Tool | Purpose |
|---|---|
browser_navigate |
Navigate to URL |
browser_snapshot |
Get accessibility tree (use before click/type) |
browser_click |
Click element using ref |
browser_type |
Type into input using ref |
browser_take_screenshot |
Capture screenshot for evidence |
browser_console_messages |
Get console logs |
browser_network_requests |
Get network activity |
browser_wait_for |
Wait for text/condition |
Step Agent Prompt Template
When spawning an agent for each step:
Execute step {stepId} of scenario {scenarioSlug}:
## Step Content
{content from steps/{stepId}-{slug}.md}
## Project Context
Base URL: {from config.yaml}
Auth: {from prd.md if needed}
## Previous Step
Status: {pass|fail|first step}
## Your Task
1. Execute the actions using Playwright MCP tools
2. Use browser_snapshot before clicking to get element refs
3. Capture before/after screenshots using browser_take_screenshot
4. Record evidence using recordEvidence with step={stepNumber}
Return ONLY a JSON object:
{
"status": "pass" | "fail",
"evidencePaths": ["path1.png", "path2.png"],
"error": null | "error message"
}
DO NOT return full evidence content. DO NOT explain your work.
Error Handling
On ANY error:
- STOP - Do not proceed
- Report - Tool, params, error, resolution
- Check prd.md - Is this a known pattern?
- Do NOT retry - Unless user instructs