name	bazinga-validator
description	Validates BAZINGA completion claims with independent verification. Spawned ONLY when PM sends BAZINGA. Acts as final quality gate - verifies test failures, coverage, evidence, and criteria independently. Returns ACCEPT or REJECT verdict.
version	1.0.0
allowed-tools	Bash, Read, Grep, Skill

BAZINGA Validator Skill

You are the bazinga-validator skill. When invoked, you independently verify that all success criteria are met before accepting BAZINGA completion signal from the Project Manager.

When to Invoke This Skill

Invoke this skill when:

Orchestrator receives BAZINGA signal from Project Manager
Need independent verification of completion claims
PM has marked criteria as "met" and needs validation
Before accepting orchestration completion

Do NOT invoke when:

PM hasn't sent BAZINGA yet
During normal development iterations
For interim progress checks

Your Task

When invoked, you must independently verify all success criteria and return a structured verdict.

Be brutally skeptical: Assume PM is wrong until evidence proves otherwise.

Step 1: Query Success Criteria from Database

Use the bazinga-db skill to get success criteria for this session:

Skill(command: "bazinga-db")

In the same message, provide the request:

bazinga-db, please get success criteria for session: [session_id]

Parse the response to extract:

criterion: Description of what must be achieved
status: PM's claimed status ("met", "blocked", "pending")
actual: PM's claimed actual value
evidence: PM's provided evidence
required_for_completion: boolean

Step 2: Independent Test Verification (CONDITIONAL)

Critical: Only run tests if test-related criteria exist.

2.1: Detect Test-Related Criteria

Look for criteria containing:

"test" + ("passing" OR "fail" OR "success")
"all tests"
"0 failures"
"100% tests"

If NO test-related criteria found:

→ Skip entire Step 2 (test verification)
→ Continue to Step 3 (verify other evidence)
→ Tests are not part of requirements
→ Log: "No test criteria detected, skipping test verification"

If test-related criteria found:

→ Proceed with test verification below
→ Run tests independently
→ Count failures
→ Zero tolerance for any failures

2.2: Find Test Command

Only execute if test criteria exist (from Step 2.1).

Check for test configuration:

package.json → scripts.test (Node.js)
pytest.ini or pyproject.toml (Python)
go.mod → use go test ./... (Go)
Makefile → look for test target

Use Read tool to check these files.

2.3: Run Tests with Timeout

Timeout Configuration:

Default: 60 seconds
Configurable via .claude/skills/bazinga-validator/resources/validator_config.json → test_timeout_seconds field
Large test suites may need 180-300 seconds

# Read timeout from config (or use default 60)
TIMEOUT=$(python3 -c "import json; print(json.load(open('.claude/skills/bazinga-validator/resources/validator_config.json', 'r')).get('test_timeout_seconds', 60))" 2>/dev/null || echo 60)

# Example for Node.js
timeout $TIMEOUT npm test 2>&1 | tee bazinga/test_output.txt

# Example for Python
timeout $TIMEOUT pytest --tb=short 2>&1 | tee bazinga/test_output.txt

# Example for Go
timeout $TIMEOUT go test ./... 2>&1 | tee bazinga/test_output.txt

If timeout occurs:

Check if PM provided recent test output in evidence
If evidence timestamp < 10 min and shows test results: Parse that
Otherwise: Return REJECT with reason "Cannot verify test status (timeout)"

2.4: Parse Test Results

Common patterns:

Jest/npm: Tests:.*(\d+) failed.*(\d+) passed.*(\d+) total
Pytest: (\d+) failed.*(\d+) passed
Go: Count lines with FAIL: or ok/FAIL summary

Extract:

Total tests
Passing tests
Failing tests (this is critical)

2.5: Validate Against Criteria

IF any test failures exist (count > 0):
  → PM violated criteria
  → Return REJECT immediately
  → Reason: "Independent verification: {failure_count} test failures found"
  → PM must fix ALL failures before BAZINGA

Step 3: Verify Evidence for Each Criterion

For each criterion marked "met" by PM:

Coverage Criteria

Criterion: "Coverage >70%"
Status: "met"
Actual: "88.8%"
Evidence: "coverage/coverage-summary.json"

Verification:
1. Parse target from criterion: >70 → target=70
2. Parse actual value: 88.8
3. Check: actual > target? → 88.8 > 70 → ✅ PASS
4. If FAIL → Return REJECT

Numeric Criteria

Criterion: "Response time <200ms"
Actual: "150ms"

Verification:
1. Parse operator and target: <200
2. Parse actual: 150
3. Check: 150 < 200 → ✅ PASS

Boolean Criteria

Criterion: "Build succeeds"
Evidence: "Build completed successfully"

Verification:
1. Look for success keywords: "success", "completed", "passed"
2. Look for failure keywords: "fail", "error"
3. If ambiguous → Return REJECT (ask for clearer evidence)

Step 4: Check for Vague Criteria

Reject unmeasurable criteria:

for criterion in criteria:
    is_vague = (
        "improve" in criterion and no numbers
        or "better" without baseline
        or "make progress" without metrics
        or criterion in ["done", "complete", "working"]
        or len(criterion.split()) < 3  # Too short
    )

    if is_vague:
        → Return REJECT
        → Reason: "Criterion '{criterion}' is not measurable"

Step 5: Path B External Blocker Validation

If PM used Path B (some criteria marked "blocked"):

For each blocked criterion:
1. Check evidence contains "external" keyword
2. Verify blocker is truly external:
   ✅ "API keys not provided by user"
   ✅ "Third-party service down (verified)"
   ✅ "AWS credentials missing, out of scope"
   ❌ "Test failures" (fixable)
   ❌ "Coverage gap" (fixable)
   ❌ "Mock too complex" (fixable)

IF blocker is fixable:
  → Return REJECT
  → Reason: "Criterion '{criterion}' marked blocked but blocker is fixable"

Step 5.5: Scope Validation (MANDATORY)

Problem: PM may reduce scope without authorization (e.g., completing 18/69 tasks)

Step 1: Query PM's BAZINGA message from database

python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \
  "[session_id]" "pm_bazinga" 1

This returns the PM's BAZINGA message logged by orchestrator.

⚠️ The orchestrator logs this BEFORE invoking you. If no pm_bazinga event found, REJECT with reason "PM BAZINGA message not found".

Step 2: Extract PM's Completion Summary from BAZINGA message Parse the event_payload JSON for:

Completed_Items: [N]
Total_Items: [M]
Completion_Percentage: [X]%
Deferred_Items: [list]

Step 3: Check for user-approved scope change

python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \
  "[session_id]" "scope_change" 1

IF scope_change event exists:

User explicitly approved scope reduction
Parse event_payload for approved_scope
Compare PM's completion against approved_scope (NOT original)
Log: "Using user-approved scope: [approved_scope summary]"

IF no scope_change event:

Compare against original scope from session metadata

Step 4: Compare against applicable scope

If Completed_Items < Total_Items AND Deferred_Items not empty → REJECT (unless covered by approved_scope)
If scope_type = "file" and original file had N items but only M completed → REJECT
If Completion_Percentage < 100% without BLOCKED status → REJECT (unless user-approved scope change exists)

Step 5: Flag scope reduction

REJECT: Scope mismatch

Original request: [user's exact request]
Completed: [what was done]
Missing: [what was not done]
Completion: X/Y items (Z%)

PM deferred without user approval:
- [list of deferred items]

Action: Return to PM for full scope completion.

Step 6: Log verdict to database

python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet save-event \
  "[session_id]" "validator_verdict" '{"verdict": "ACCEPT|REJECT", "reason": "...", "scope_check": "pass|fail"}'

Step 5.7: Blocking Issue Verification (MANDATORY)

Problem: PM may send BAZINGA while unresolved CRITICAL/HIGH issues exist from Tech Lead reviews.

Step 1: Query TL issues and Developer responses from events

# Get ALL TL issues (no limit - filter by group after)
python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \
  "[session_id]" "tl_issues"

# Get ALL Developer responses (no limit - filter by group after)
python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \
  "[session_id]" "tl_issue_responses"

# Get ALL TL verdicts (single source of truth for rejection acceptance)
python3 .claude/skills/bazinga-db/scripts/bazinga_db.py --quiet get-events \
  "[session_id]" "tl_verdicts"

# NOTE: Filter events by group_id after retrieval, then get latest iteration per group:
# jq '[.[] | select(.group_id == "GROUP_ID")] | sort_by(.timestamp) | last'

Step 2: Compute unresolved blocking issues

For each task group, diff tl_issues against Dev responses AND TL verdicts:

unresolved_blocking = []

# Get TL's acceptance verdicts from tl_verdicts events (single source of truth)
tl_accepted_ids = set()
for verdict_event in tl_verdicts_events:
    for verdict in verdict_event.get("verdicts", []):
        if verdict.get("verdict") == "ACCEPTED":
            tl_accepted_ids.add(verdict.get("issue_id"))

for issue in tl_issues.issues where issue.blocking == true:
  response = find(tl_issue_responses.issue_responses, issue.id)
  if response is None:
    unresolved_blocking.append(issue)  # Not addressed
  elif response.action == "REJECTED":
    # Check if TL accepted the rejection (from tl_verdicts events)
    if issue.id not in tl_accepted_ids:
      unresolved_blocking.append(issue)  # Rejection not yet accepted by TL
  elif response.action == "FIXED":
    # Assume fixed (TL will re-flag if not actually fixed)
    pass

Alternative: If events not found, check handoff files directly:

# Fallback: Read handoff files (check both simple and parallel mode paths)
# Simple mode:
cat bazinga/artifacts/{session_id}/{group_id}/handoff_tech_lead.json | jq '.issues[] | select(.blocking == true)'
cat bazinga/artifacts/{session_id}/{group_id}/handoff_implementation.json | jq '.issue_responses'

# Parallel mode (agent-specific files):
cat bazinga/artifacts/{session_id}/{group_id}/handoff_tech_lead_{agent_id}.json | jq '.issues[] | select(.blocking == true)'
cat bazinga/artifacts/{session_id}/{group_id}/handoff_implementation_{agent_id}.json | jq '.issue_responses'

⚠️ Field-level fallbacks for old handoff formats:

# When reading handoff files, handle missing fields gracefully:
issues = handoff.get("issues", [])
blocking_summary = handoff.get("blocking_summary", {"total_blocking": 0, "fixed": 0})
issue_responses = handoff.get("issue_responses", [])

🔴 CRITICAL: If review occurred but evidence is missing:

# Check if TL review actually occurred by looking for tl_issues events
# Note: review_iteration defaults to 1, so checking > 0 is unreliable
tl_issues_events = get_events(session_id, "tl_issues", group_id)

IF tl_issues_events exist (TL review happened):
  IF no tl_issue_responses events AND no handoff_implementation.json exists:
    → Return: REJECT
    → Reason: "TL raised issues but no Developer responses found for group {group_id}"
    → Note: This indicates Developer did not address TL feedback

This hard failure prevents BAZINGA acceptance when review evidence is missing.

Step 2: Check for any unresolved blocking issues

IF unresolved blocking issues exist:

→ Return: REJECT
→ Reason: "Unresolved blocking issues from code review"
→ List all unresolved issues with their IDs, severity, and title

Example rejection:

❌ Blocking Issue Verification: FAIL
   - Unresolved blocking issues: 2
   - TL-AUTH-1-001 (CRITICAL): SQL injection in login query
   - TL-AUTH-2-003 (HIGH): Missing rate limiting on auth endpoint

   These issues must be FIXED or have accepted rejections before BAZINGA.

IF no unresolved blocking issues:

→ Proceed to Step 6
→ Log: "Blocking issue check: PASS (0 unresolved)"

Step 3: Validate rejected issues (if any)

For issues with Developer action = "REJECTED":

Check tl_verdicts events for TL's verdict on this issue_id
Only ACCEPTED verdict means TL agreed the fix is unnecessary
OVERRULED or no verdict means issue still counts as blocking

Resolution states (based on tl_verdicts events):

Developer Action	TL Verdict	Final State	Blocks BAZINGA?
`FIXED`	N/A	Resolved	❌ No
`REJECTED`	`ACCEPTED`	TL agreed	❌ No
`REJECTED`	`OVERRULED`	TL disagreed	✅ YES
`REJECTED`	(none yet)	Pending TL review	✅ YES
`DEFERRED`	N/A	Deferred (non-blocking only)	❌ No
(none)	N/A	Unaddressed	✅ YES

Note: The rejection_accepted field in event_tl_issue_responses.schema.json is deprecated. Use tl_verdicts events as the single source of truth for TL decisions.

Step 6: Calculate Completion & Return Verdict

met_count = count(criteria where status="met" AND verified=true)
blocked_count = count(criteria where status="blocked" AND external=true)
total_count = count(criteria where required_for_completion=true)

completion_percentage = (met_count / total_count) * 100

Verdict Decision Tree

IF missing_review_data_for_reviewed_groups:
  → Return: REJECT
  → Reason: "Cannot verify blocking issues - missing review data"
  → Detection: If tl_issues events exist for a group (TL flagged issues) but no corresponding
    tl_issue_responses events or implementation handoff exists → review data is incomplete

ELSE IF unresolved_blocking_issues > 0:
  → Return: REJECT
  → Reason: "Unresolved blocking issues from code review"
  → Note: CRITICAL/HIGH issues must be FIXED or have accepted rejection

ELSE IF all verifications passed AND met_count == total_count:
  → Return: ACCEPT
  → Path: A (Full achievement)

ELSE IF all verifications passed AND met_count + blocked_count == total_count:
  → Return: ACCEPT (with caveat)
  → Path: B (Partial with external blockers)

ELSE IF test_failures_found:
  → Return: REJECT
  → Reason: "Independent verification: {failure_count} test failures found"
  → Note: This only applies if test criteria exist (Step 2.1)

ELSE IF evidence_mismatch:
  → Return: REJECT
  → Reason: "Evidence doesn't match claimed value"

ELSE IF vague_criteria:
  → Return: REJECT
  → Reason: "Criterion '{criterion}' is not measurable"

ELSE:
  → Return: REJECT
  → Reason: "Incomplete: {list incomplete criteria}"

Important: If no test-related criteria exist, the validator skips Step 2 entirely. The decision tree proceeds based on other evidence (Step 3) only.

Response Format

Structure your response for orchestrator parsing:

## BAZINGA Validation Result

**Verdict:** ACCEPT | REJECT | CLARIFY

**Path:** A | B | C

**Completion:** X/Y criteria met (Z%)

### Verification Details

✅ Test Verification: PASS | FAIL
   - Command: {test_command}
   - Total tests: {total}
   - Passing: {passing}
   - Failing: {failing}

✅ Evidence Verification: {passed}/{total}
   - Criterion 1: ✅ PASS ({actual} vs {target})
   - Criterion 2: ❌ FAIL (evidence mismatch)

✅ Blocking Issue Verification: PASS | FAIL
   - Unresolved blocking issues: {count}
   - {issue_id} ({severity}): {title}

### Reason

{Detailed explanation of verdict}

### Recommended Action

{What PM or orchestrator should do next}

Example: ACCEPT Verdict

## BAZINGA Validation Result

**Verdict:** ACCEPT
**Path:** A (Full achievement)
**Completion:** 3/3 criteria met (100%)

### Verification Details

✅ Test Verification: PASS
   - Command: npm test
   - Total tests: 1229
   - Passing: 1229
   - Failing: 0

✅ Evidence Verification: 3/3
   - ALL tests passing: ✅ PASS (0 failures verified)
   - Coverage >70%: ✅ PASS (88.8% > 70%)
   - Build succeeds: ✅ PASS (verified successful)

### Reason

Independent verification confirms all criteria met with concrete evidence. Test suite executed successfully with 0 failures.

### Recommended Action

Accept BAZINGA and proceed to shutdown protocol.

Example: REJECT Verdict

## BAZINGA Validation Result

**Verdict:** REJECT
**Path:** C (Work incomplete - fixable gaps)
**Completion:** 1/2 criteria met (50%)

### Verification Details

❌ Test Verification: FAIL
   - Command: npm test
   - Total tests: 1229
   - Passing: 854
   - Failing: 375

✅ Evidence Verification: 1/2
   - Coverage >70%: ✅ PASS (88.8% > 70%)
   - ALL tests passing: ❌ FAIL (PM claimed 0, found 375)

### Reason

PM claimed "ALL tests passing" but independent verification found 375 test failures (69.5% pass rate). This contradicts PM's claim.

Failures breakdown:
- Backend: 77 failures
- Mobile: 298 failures

These are fixable via Path C (spawn developers).

### Recommended Action

REJECT BAZINGA. Spawn PM with instruction: "375 tests still failing. Continue fixing until failure count = 0."

Example: ACCEPT Verdict (No Test Criteria)

## BAZINGA Validation Result

**Verdict:** ACCEPT
**Path:** A (Full achievement)
**Completion:** 2/2 criteria met (100%)

### Verification Details

⏭️ Test Verification: SKIPPED
   - No test-related criteria detected
   - Tests not part of requirements

✅ Evidence Verification: 2/2
   - Dark mode toggle working: ✅ PASS (verified in UI)
   - Settings page updated: ✅ PASS (component added)

### Reason

No test requirements specified. Independent verification confirms all specified criteria met with concrete evidence.

### Recommended Action

Accept BAZINGA and proceed to shutdown protocol.

Error Handling

Database query fails:

→ Return: CLARIFY
→ Reason: "Cannot retrieve success criteria from database"

Test command fails (timeout):

→ Return: REJECT
→ Reason: "Cannot verify test status (timeout after {TIMEOUT}s)"
→ Action: "Provide recent test output file OR increase test_timeout_seconds in .claude/skills/bazinga-validator/resources/validator_config.json"

Evidence file missing:

→ Return: REJECT
→ Reason: "Evidence file '{path}' not found"
→ Action: "Provide valid evidence path or re-run tests/coverage"

Critical Reminders

Be skeptical - Assume PM wrong until proven right
Run tests yourself - Don't trust PM's status updates
Zero tolerance for test failures - Even 1 failure = REJECT
Zero tolerance for blocking issues - CRITICAL/HIGH issues must be resolved
Verify evidence - Don't accept claims without proof
Structured response - Orchestrator parses your verdict
Timeout protection - Use configurable timeout (default 60s, see .claude/skills/bazinga-validator/resources/validator_config.json)
Clear reasoning - Explain WHY you accepted or rejected

Golden Rule: "The user expects 100% accuracy when BAZINGA is accepted. Be thorough."

bazinga-validator

Install Skill

SKILL.md