| name | dev-verify |
| description | REQUIRED Phase 7 of /dev workflow (final). Requires fresh runtime evidence before claiming completion. |
Announce: "I'm using dev-verify (Phase 7) to confirm completion with fresh evidence."
Contents
- The Iron Law of Verification
- Red Flags - STOP Immediately If You Think
- The Gate Function
- Claims Requiring Evidence
- Insufficient Evidence
- Verification Patterns
- User Acceptance (Final Step)
- Bottom Line
Verification Gate
The automated test IS your deliverable. The implementation just makes the test pass.
Reframe your task:
- ❌ "Implement feature X, and test it"
- ✅ "Write an automated test that proves feature X works. Then make it pass."
The test proves value. The implementation is a means to an end.
Without a REAL automated test (executes code, verifies behavior), you have delivered NOTHING.
NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE. This is not negotiable.
Before claiming ANYTHING is complete, you MUST:
- IDENTIFY - Which command proves your assertion?
- RUN - Execute the command fresh (not from cache/memory)
- READ - Review full output and exit codes
- VERIFY - Confirm output supports your claim
- Only THEN make the claim
This applies even when:
- "I just ran it a moment ago"
- "The agent said it passed"
- "It should work"
- "I'm confident it's fine"
If you catch yourself about to claim completion without fresh evidence, STOP.
Red Flags - STOP Immediately If You Think:
| Thought | Why It's Wrong | Do Instead |
|---|---|---|
| "It should work" | "Should" isn't evidence | Run the command |
| "I'm pretty sure it passes" | Confidence isn't verification | Run the command |
| "The agent reported success" | Agent reports need confirmation | Run it yourself |
| "I ran it earlier" | Earlier isn't fresh | Run it again |
| "The code exists" | Existing ≠ working | Run and check output |
| "Grep shows the function" | Pattern match ≠ runtime test | Run the function |
The Gate Function
Before making ANY status claim:
1. IDENTIFY → Which command proves your assertion?
2. RUN → Execute the command fresh
3. READ → Review full output and exit codes
4. VERIFY → Confirm output supports your claim
5. CLAIM → Only after steps 1-4
Skipping any step is dishonest, not verification.
Claims Requiring Evidence
| Claim | Required Evidence |
|---|---|
| "Tests pass" | Test output showing 0 failures |
| "Build succeeds" | Exit code 0 from build command |
| "Linter clean" | Linter output showing 0 errors |
| "Bug fixed" | Test that failed now passes |
| "Feature complete" | All acceptance criteria verified |
| "User-facing feature works" | E2E test output showing PASS |
USER-FACING CLAIMS REQUIRE E2E EVIDENCE. Unit tests are insufficient.
| Claim | Unit Test Evidence | E2E Evidence Required |
|---|---|---|
| "API works" | ❌ Insufficient | ✅ Full request/response test |
| "UI renders" | ❌ Insufficient | ✅ Playwright snapshot/interaction |
| "Feature complete" | ❌ Insufficient | ✅ User flow simulation |
| "No regressions" | ❌ Insufficient | ✅ E2E suite passes |
Fake E2E Patterns - STOP
These are NOT E2E tests. They are observability, not verification.
| ❌ Fake E2E | ✅ Real E2E |
|---|---|
| "Log shows function was called" | "Screenshot shows correct UI rendered" |
| "grep papirus in logs" | "grim screenshot + visual diff confirms icon changed" |
| "Console output contains 'success'" | "Playwright assertion: element.textContent === 'Success'" |
| "File was created" | "E2E test opens file and verifies contents" |
| "Process exited 0" | "Functional test verifies actual output matches spec" |
| "Mock returned expected value" | "Real integration returns expected value" |
Red Flag: If you catch yourself thinking "logs prove it works" - STOP. Logs prove code executed, not that it produced correct results. E2E means verifying the actual output users see.
Rationalization Prevention (E2E)
| Thought | Reality |
|---|---|
| "Unit tests cover it" | Unit tests don't simulate users. Where's E2E? |
| "E2E would be redundant" | Redundancy catches bugs. Write E2E. |
| "No time for E2E" | No time to fix production bugs? Write E2E. |
| "Feature is internal" | Does it affect user output? Then E2E. |
| "I manually tested" | Manual = no evidence. Automate it. |
| "Log checking verifies it works" | Log checking only verifies code executed, not results. Not E2E. |
| "E2E with screenshots is too complex" | If too complex to verify, feature isn't done. Complexity = bugs hiding. |
| "Implementation is done, testing is just verification" | Testing IS implementation. Untested code is unfinished code. |
The E2E Gate Function
For user-facing changes, add to verification:
1. IDENTIFY → Which E2E test proves user-facing behavior?
2. RUN → Execute E2E test fresh
3. READ → Review full output (screenshots if visual)
4. VERIFY → User flow works as specified
5. CLAIM → Only after E2E evidence exists
"Unit tests pass" is not "feature complete" for user-facing changes.
Insufficient Evidence
These do NOT count as verification:
- Previous runs (must be fresh)
- Assumptions ("it should work")
- Partial checks (ran some tests, not all)
- Agent reports without independent confirmation
- "I think..." / "It seems..." / "Probably..."
Honesty Requirement
When you say "Feature complete", you are asserting:
- You ran the verification commands yourself (fresh)
- You saw the output with your own tokens
- The output confirms the claim
Saying "complete" based on stale data or agent reports is not "summarizing" - it is LYING about project state.
"Still verifying" is honest. "Complete" without evidence is fraud.
Rationalization Prevention
These thoughts mean STOP—you're about to claim falsely:
| Thought | Reality |
|---|---|
| "I just ran it" | "Just" = stale. Run it AGAIN. |
| "The agent said it passed" | Agent reports need YOUR confirmation. Run it. |
| "It should work" | "Should" is hope. Run and see output. |
| "I'm confident" | Confidence ≠ verification. Run the command. |
| "We already verified earlier" | Earlier ≠ now. Fresh evidence only. |
| "User will verify it" | NO. Verify before claiming. User trusts your claim. |
| "Close enough" | Close ≠ complete. Verify fully. |
| "Time to move on" | Only move on after FRESH verification. |
STRUCTURAL VERIFICATION IS NOT RUNTIME VERIFICATION:
| ❌ NOT Verification | ✅ IS Verification |
|---|---|
| "Code exists in file" | "Code ran and produced output X" |
| "Function is defined" | "Function was called and returned Y" |
| "Grep found the pattern" | "Program output shows expected behavior" |
| "ast-grep found the code" | "Test executed and passed with output" |
| "Diff shows the change" | "Change tested with actual input/output" |
| "Implementation looks correct" | "Ran test, saw PASS in logs" |
The key difference:
- Structural: "The code IS THERE" (useless)
- Runtime: "The code WORKS" (valid)
If you find yourself saying "the code exists" or "I verified the implementation" without running it, STOP - that's not verification.
Verification Patterns
Tests
# Run the command
npm test # or pytest, cargo test, etc.
# See numerical results
# "34/34 pass" → Can claim "tests pass"
# "33/34 pass" → CANNOT claim success
Regression Test
# 1. Write test → run (should pass)
# 2. Revert fix → run (MUST fail)
# 3. Restore fix → run (should pass)
# Only then claim "bug is fixed"
Build
npm run build && echo "Exit code: $?"
# Must see "Exit code: 0" to claim success
User Acceptance (Final Step)
After technical verification passes, confirm with user:
AskUserQuestion:
question: "Does this implementation meet your requirements?"
options:
- label: "Yes, requirements met"
description: "Feature works as designed, ready to merge"
- label: "Partially"
description: "Core works but missing some requirements"
- label: "No"
description: "Does not meet requirements, needs more work"
Reference .claude/SPEC.md when asking - remind user of the success criteria they defined.
If user says "Partially" or "No":
- Ask which specific requirement is not met
- Return to
/dev-implementto address gaps - Re-run verification
Only claim COMPLETE when:
- All technical tests pass (automated)
- User confirms requirements met (manual)
Bottom Line
Two types of verification required:
- Technical - Run commands, see output, confirm no errors
- Requirements - Ask user if it does what they wanted
Both must pass. No shortcuts exist.
Workflow Complete
When user confirms "Yes, requirements met":
Announce: "Dev workflow complete. All 7 phases passed."
The /dev workflow is now finished. Offer to:
- Commit the changes
- Clean up
.claude/files - Start a new feature with
/dev