name	physical-verification
description	Physically verify UI changes using computer vision and screenshots. Use when implementing UI features, verifying visual elements, or testing interface functionality. Prevents hallucination by requiring visual evidence before claiming features work.
allowed-tools	Bash, Read, Write

Physical Verification Skill

Purpose

This skill provides the gating mechanism to prevent hallucination about UI features. It requires physical verification through screenshots and computer vision before claiming any UI feature works.

Core Principle

Code existence ≠ functionality. Never claim a feature works without visual evidence.

Verification Workflow

Copy this checklist and track progress:

Verification Progress:
- [ ] Step 1: Take screenshot of initial state
- [ ] Step 2: Identify target element visually
- [ ] Step 3: Perform interaction (click, type, etc.)
- [ ] Step 4: Take screenshot of resulting state
- [ ] Step 5: Compare before/after screenshots
- [ ] Step 6: Document findings with visual evidence

Step 1: Capture Initial State

Take a screenshot showing the current UI state:

python -c "from services.chatkit_backend.app.tools.computer_use import get_computer_tool; import json; result = get_computer_tool().screenshot(); print(json.dumps(result))"

Save the screenshot path for comparison.

Step 2: Locate Target Element

Use Claude 4 Computer Use API to identify the element:

from services.chatkit_backend.app.vision import AuditorAgent
import asyncio
import os

auditor = AuditorAgent(api_key=os.getenv("ANTHROPIC_API_KEY"))
result = await auditor.verify_ui_element(
    element_description="[describe element: 'CostDashboard button in bottom-right']",
    expected_state="[describe expected state: 'visible, pressable, minimized by default']",
    max_iterations=5
)

Step 3: Perform Interaction

If interactive testing is needed, use xdotool:

export DISPLAY=:1
xdotool mousemove [x] [y]
xdotool click 1

Step 4: Capture Result State

Take another screenshot after interaction:

python -c "from services.chatkit_backend.app.tools.computer_use import get_computer_tool; import json; result = get_computer_tool().screenshot(); print(json.dumps(result))"

Step 5: Visual Comparison

Ask Claude 4 to compare the screenshots:

# Compare before/after
result = await auditor.comprehensive_ui_audit(
    feature_name="[Feature being verified]",
    test_cases=[
        {
            "element": "[Element name]",
            "expected_state": "[Expected behavior]",
            "action": "[What was clicked/typed]"
        }
    ],
    max_iterations=10
)

Step 6: Documentation

Create a verification report with:

Before screenshot path
After screenshot path
Description of expected vs actual behavior
VERIFIED or FAILED status
Evidence (screenshot comparisons)

When to Use This Skill

ALWAYS Use For:

New UI components
Button interactions
Layout changes
Color/theme modifications
Visibility toggles
Any claim that something "works"

NEVER Claim Without:

At least one screenshot showing the feature
Visual confirmation element exists
Evidence of expected behavior

Integration with Constructor/Auditor Pattern

When using the two-agent pattern:

Constructor Agent: Builds UI, registers changes

from services.chatkit_backend.app.vision import register_and_verify_ui_change

await register_and_verify_ui_change(
    change_id="cost-dashboard-button",
    description="Added minimizable cost dashboard button",
    files_modified=["src/components/CostDashboard.tsx", "src/components/buttons.css"],
    verification_criteria=[
        {
            "element": "Cost dashboard button in bottom-right corner",
            "expected_state": "visible, clickable, shows $ icon"
        },
        {
            "element": "Cost dashboard panel",
            "expected_state": "hidden by default, appears on button click"
        }
    ],
    priority="high",
    wait_for_verification=True  # Block until auditor verifies
)

Auditor Agent: Automatically verifies using computer vision

The orchestrator handles the verification queue automatically.

Anti-Patterns to Avoid

❌ Code Inspection Only

# BAD: Only checking if code exists
if os.path.exists("src/components/CostDashboard.tsx"):
    print("✓ CostDashboard verified")  # HALLUCINATION