Claude Code Plugins

Community-maintained marketplace

Feedback

physical-verification

@Earths-Reckoning/Cheeseburg124
0
0

Physically verify UI changes using computer vision and screenshots. Use when implementing UI features, verifying visual elements, or testing interface functionality. Prevents hallucination by requiring visual evidence before claiming features work.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name physical-verification
description Physically verify UI changes using computer vision and screenshots. Use when implementing UI features, verifying visual elements, or testing interface functionality. Prevents hallucination by requiring visual evidence before claiming features work.
allowed-tools Bash, Read, Write

Physical Verification Skill

Purpose

This skill provides the gating mechanism to prevent hallucination about UI features. It requires physical verification through screenshots and computer vision before claiming any UI feature works.

Core Principle

Code existence ≠ functionality. Never claim a feature works without visual evidence.

Verification Workflow

Copy this checklist and track progress:

Verification Progress:
- [ ] Step 1: Take screenshot of initial state
- [ ] Step 2: Identify target element visually
- [ ] Step 3: Perform interaction (click, type, etc.)
- [ ] Step 4: Take screenshot of resulting state
- [ ] Step 5: Compare before/after screenshots
- [ ] Step 6: Document findings with visual evidence

Step 1: Capture Initial State

Take a screenshot showing the current UI state:

python -c "from services.chatkit_backend.app.tools.computer_use import get_computer_tool; import json; result = get_computer_tool().screenshot(); print(json.dumps(result))"

Save the screenshot path for comparison.

Step 2: Locate Target Element

Use Claude 4 Computer Use API to identify the element:

from services.chatkit_backend.app.vision import AuditorAgent
import asyncio
import os

auditor = AuditorAgent(api_key=os.getenv("ANTHROPIC_API_KEY"))
result = await auditor.verify_ui_element(
    element_description="[describe element: 'CostDashboard button in bottom-right']",
    expected_state="[describe expected state: 'visible, pressable, minimized by default']",
    max_iterations=5
)

Step 3: Perform Interaction

If interactive testing is needed, use xdotool:

export DISPLAY=:1
xdotool mousemove [x] [y]
xdotool click 1

Step 4: Capture Result State

Take another screenshot after interaction:

python -c "from services.chatkit_backend.app.tools.computer_use import get_computer_tool; import json; result = get_computer_tool().screenshot(); print(json.dumps(result))"

Step 5: Visual Comparison

Ask Claude 4 to compare the screenshots:

# Compare before/after
result = await auditor.comprehensive_ui_audit(
    feature_name="[Feature being verified]",
    test_cases=[
        {
            "element": "[Element name]",
            "expected_state": "[Expected behavior]",
            "action": "[What was clicked/typed]"
        }
    ],
    max_iterations=10
)

Step 6: Documentation

Create a verification report with:

  • Before screenshot path
  • After screenshot path
  • Description of expected vs actual behavior
  • VERIFIED or FAILED status
  • Evidence (screenshot comparisons)

When to Use This Skill

ALWAYS Use For:

  • New UI components
  • Button interactions
  • Layout changes
  • Color/theme modifications
  • Visibility toggles
  • Any claim that something "works"

NEVER Claim Without:

  • At least one screenshot showing the feature
  • Visual confirmation element exists
  • Evidence of expected behavior

Integration with Constructor/Auditor Pattern

When using the two-agent pattern:

Constructor Agent: Builds UI, registers changes

from services.chatkit_backend.app.vision import register_and_verify_ui_change

await register_and_verify_ui_change(
    change_id="cost-dashboard-button",
    description="Added minimizable cost dashboard button",
    files_modified=["src/components/CostDashboard.tsx", "src/components/buttons.css"],
    verification_criteria=[
        {
            "element": "Cost dashboard button in bottom-right corner",
            "expected_state": "visible, clickable, shows $ icon"
        },
        {
            "element": "Cost dashboard panel",
            "expected_state": "hidden by default, appears on button click"
        }
    ],
    priority="high",
    wait_for_verification=True  # Block until auditor verifies
)

Auditor Agent: Automatically verifies using computer vision

The orchestrator handles the verification queue automatically.

Anti-Patterns to Avoid

❌ Code Inspection Only

# BAD: Only checking if code exists
if os.path.exists("src/components/CostDashboard.tsx"):
    print("✓ CostDashboard verified")  # HALLUCINATION

✓ Physical Verification

# GOOD: Actually seeing it
screenshot = computer_tool.screenshot()
result = await auditor.verify_ui_element("CostDashboard button")
if result["success"]:
    print("✓ CostDashboard verified with visual evidence")

❌ Test Passing = Works

# BAD: Test passed, claim it works
assert dashboard.is_visible() == True
print("✓ Dashboard works")  # Tests can lie

✓ Visual Evidence = Works

# GOOD: Can see it, can interact with it
screenshot_before = take_screenshot()
click_button(x, y)
screenshot_after = take_screenshot()
if screenshots_differ(before, after):
    print("✓ Dashboard verified: visual evidence of interaction")

Environment Requirements

Required packages (already in replit.nix):

  • xorg.xorgserver (Xvfb for virtual display)
  • xdotool (mouse/keyboard automation)
  • scrot (screenshot capture)
  • chromium (browser for UI rendering)

Required Python packages:

  • anthropic (Claude 4 Computer Use API)
  • Pillow (image processing)

Virtual display should be running:

# Check if display is active
ps aux | grep Xvfb

# If not running, start it
Xvfb :1 -screen 0 1024x768x24 &
export DISPLAY=:1

Error Handling

If verification fails:

  1. Check virtual display is running
  2. Verify browser can render UI
  3. Take diagnostic screenshot
  4. Check element actually exists in DOM
  5. Re-read implementation files
  6. Fix the code, don't fake the verification

Success Criteria

A feature is VERIFIED when:

  1. Screenshot shows element exists
  2. Element is in expected location
  3. Element has expected visual appearance
  4. Interaction produces expected result
  5. Before/after screenshots document the behavior

A feature is FAILED when:

  • Cannot find element in screenshot
  • Element exists but wrong appearance
  • Interaction doesn't produce expected result
  • Screenshots don't show claimed functionality

Memory Integration

After verification, upsert findings to memory:

from services.chatkit_backend.app.llm.memory_tool import upsert_memory

await upsert_memory(
    content=f"""
    Verification Report: {feature_name}
    Status: {status}
    Evidence: {screenshot_paths}
    Findings: {description}
    """,
    context="physical_verification"
)

This creates a permanent record of what was actually verified.

Next Steps

After using this skill to verify a feature:

  1. Mark the feature as VERIFIED in documentation
  2. Include screenshot paths as evidence
  3. Update task tracking with verification status
  4. Move to next feature requiring verification

For comprehensive audits of complete features, use the comprehensive_ui_audit method with multiple test cases covering all critical paths.