| name | dev-testing |
| description | Development cycle testing gate (Gate 3) - ensures unit test coverage for all acceptance criteria using TDD methodology (RED-GREEN-REFACTOR). Focus: Unit tests only. |
| trigger | - After implementation and SRE complete (Gate 0/1/2) - Task has acceptance criteria requiring test coverage - Need to verify implementation meets requirements |
| skip_when | - No acceptance criteria defined -> request criteria first - Implementation not started -> complete Gate 0 first - Already has full test coverage verified -> proceed to review |
| sequence | [object Object] |
| related | [object Object] |
| verification | [object Object] |
| examples | [object Object], [object Object] |
Dev Testing (Gate 3)
See CLAUDE.md for canonical testing requirements.
Overview
Ensure every acceptance criterion has at least one unit test proving it works. Follow TDD methodology: RED (failing test) -> GREEN (minimal implementation) -> REFACTOR (clean up).
Core principle: Untested acceptance criteria are unverified claims. Each criterion MUST map to at least one executable unit test.
Scope: This gate focuses on unit tests.
Pressure Resistance
See shared-patterns/shared-pressure-resistance.md for universal pressure scenarios.
Gate 3-specific note: Coverage threshold (85%) is MANDATORY minimum, not aspirational target. Below threshold = FAIL.
Common Rationalizations - REJECTED
See shared-patterns/shared-anti-rationalization.md for universal anti-rationalizations.
Gate 3-specific rationalizations:
| Excuse | Reality |
|---|---|
| "Manual testing validates all criteria" | Manual tests are not executable, not repeatable. Gate 3 requires unit tests. |
| "Integration tests are better verification" | Gate 3 scope is unit tests only. Integration tests are different scope. |
| "These mocks make it a unit test" | If you hit DB/API/filesystem, it's integration. Mock the interface. |
| "PROJECT_RULES.md says 70% is OK" | Ring minimum is 85%. PROJECT_RULES.md can raise, not lower. |
Red Flags - STOP
See shared-patterns/shared-red-flags.md for universal red flags (including Testing section).
If you catch yourself thinking ANY of those patterns, STOP immediately. Write unit tests until threshold met.
Coverage Threshold Governance
Ring establishes MINIMUM thresholds. PROJECT_RULES.md can only ADD constraints:
| Source | Can Raise? | Can Lower? |
|---|---|---|
| Ring Standard (85%) | N/A (baseline) | N/A (baseline) |
| PROJECT_RULES.md | ✅ YES (e.g., 90%) | ❌ NO |
| Team Decision | ✅ YES (e.g., 95%) | ❌ NO |
| Manager Override | ❌ NO | ❌ NO |
Coverage Measurement Rules (HARD GATE)
Coverage tool output is the SOURCE OF TRUTH. No adjustments allowed:
Dead Code Handling
| Scenario | Allowed? | Required Action |
|---|---|---|
| Exclude dead code from measurement | ❌ NO | Delete the dead code |
Use // coverage:ignore annotations |
⚠️ ONLY with justification | Must document WHY in PR |
| "Mental math" adjustments | ❌ NO | Use actual tool output |
| Different tools show different % | Use lowest | Conservative approach |
Valid exclusion annotations (must be documented):
- Generated code (protobuf, swagger) - excluded by default
- Unreachable panic handlers (already tested by unit tests)
- Platform-specific code not running in CI
INVALID exclusion reasons:
- ❌ "It's boilerplate"
- ❌ "Trivial getters/setters"
- ❌ "Will never be called in production"
- ❌ "Too hard to test"
Tool Dispute Resolution
If you believe the coverage tool is wrong:
| Step | Action | Why |
|---|---|---|
| 1 | Document the discrepancy | "Tool shows 83%, I expect 90%" |
| 2 | Investigate root cause | Missing test? Tool bug? Config issue? |
| 3 | Fix the issue | Add test OR fix tool config |
| 4 | Re-run coverage | Get new measurement |
| 5 | Use NEW measurement | No mental math on old number |
You CANNOT proceed with "the tool is wrong, real coverage is higher."
Anti-Rationalization for Coverage
See shared-patterns/shared-anti-rationalization.md for universal anti-rationalizations.
| Rationalization | Why It's WRONG | Required Action |
|---|---|---|
| "Tool shows 83% but real is 90%" | Tool output IS real. Your belief is not. | Fix issue, re-measure |
| "Excluding dead code gets us to 85%" | Delete dead code, don't exclude it. | Delete dead code |
| "Test files lower the average" | Test files should be excluded by tool config. | Fix tool configuration |
| "Generated code tanks coverage" | Generated code should be excluded by pattern. | Add exclusion pattern |
| "84.5% rounds to 85%" | Rounding is not allowed. 84.5% < 85%. | Write more tests |
| "Close enough with all AC tested" | "Close enough" is not a passing grade. | Meet exact threshold |
If PROJECT_RULES.md specifies < 85%:
- That specification is INVALID
- Ring minimum (85% branch coverage) still applies
- Report as blocker: "PROJECT_RULES.md coverage threshold (X%) below Ring minimum (85%)"
- Do NOT proceed with lower threshold
Unit Test vs Integration Test
Gate 3 requires UNIT tests. Know the difference:
| Type | Characteristics | Gate 3? |
|---|---|---|
| Unit ✅ | Mocks all external dependencies, tests single function | YES |
| Integration ❌ | Hits real database/API/filesystem | NO - separate scope |
If you're thinking "but my test needs the database":
- That's an integration test
- Mock the repository interface
- Test the business logic, not the database
"Mocking the database driver" is still integration. Mock the INTERFACE.
Prerequisites
Before starting this gate:
- Implementation code exists (Gate 0 complete)
- Acceptance criteria are clearly defined
- Test framework is configured and working
Step 1: Map Acceptance Criteria to Unit Tests
Create traceability matrix: | ID | Criterion | Test File | Test Function | Status (Pending/PASS) |
Rule: Every criterion MUST have at least one unit test. No criterion left untested.
Step 2: Identify Testable Units
| Criterion Type | Testable Unit | Mock Dependencies |
|---|---|---|
| Business logic | Service methods | Repository, external APIs |
| Validation | Validators, DTOs | None (pure functions) |
| Domain rules | Entity methods | None |
| Error handling | Service methods | Force error conditions |
Step 3: Execute TDD Cycle
| Phase | Action | Verify |
|---|---|---|
| RED | Write test describing expected behavior | Test runs + fails for right reason (feature missing, not typo) + paste failure output |
| GREEN | Write minimal code to pass | Test passes |
| REFACTOR | Remove duplication, improve naming, extract helpers | Tests still pass |
Step 4: Dispatch QA Analyst for Unit Tests
Dispatch: Task(subagent_type: "qa-analyst") with TASK_ID, ACCEPTANCE_CRITERIA, IMPLEMENTATION_FILES. Requirements: 1+ test/criterion, edge cases, mock deps, naming: Test{Unit}_{Method}_{Scenario}. Unit tests ONLY.
Step 5: Execute Full Test Suite
Run: go test ./... -covermode=atomic -coverprofile=coverage.out && go tool cover -func=coverage.out (or npm test -- --coverage, pytest --cov=src). Verify coverage ≥85%.
Step 6: Update Traceability Matrix
Update matrix with line numbers and PASS status for each criterion.
Gate Exit Criteria
Before proceeding to Gate 4 (Review):
Coverage Gate (Existing)
- Coverage MUST meet 85% branch coverage minimum (or project-defined threshold in PROJECT_RULES.md). This is MANDATORY, not aspirational.
- Below threshold = FAIL
- No exceptions for "close enough"
- All acceptance criteria MUST have executable tests
Quality Gate (NEW - Prevents dev-refactor Duplicates)
⛔ HARD GATE: ALL quality checks must PASS, not just coverage %
| Check | Verification | PASS Criteria |
|---|---|---|
| [ ] Skipped tests | grep -rn "\.skip|\.todo|xit|xdescribe" |
0 found |
| [ ] Assertion-less tests | Review test bodies for expect/assert |
0 found |
| [ ] Shared state | Check beforeAll/afterAll usage |
No shared mutable state |
| [ ] Naming convention | Pattern: Test{Unit}_{Scenario} |
100% compliant |
| [ ] Edge cases | Count per AC | ≥2 edge cases per AC |
| [ ] TDD evidence | Captured RED phase output | All new tests |
| [ ] Test isolation | No execution order dependency | Tests pass in any order |
Why this matters: Issues caught here won't escape to dev-refactor. If dev-refactor finds test-related issues, Gate 3 failed.
Completeness Gate (Existing)
- Every acceptance criterion has at least one unit test
- All tests follow TDD cycle (RED verified before GREEN)
- Full test suite passes (0 failures)
- Traceability matrix complete and updated
- No skipped or pending tests for this task
- QA Analyst VERDICT = PASS (mandatory for gate progression)
Edge Case Requirements
| AC Type | Required Edge Cases | Minimum |
|---|---|---|
| Input validation | null, empty, boundary, invalid format | 3+ |
| CRUD operations | not found, duplicate, concurrent access | 3+ |
| Business logic | zero, negative, overflow, boundary | 3+ |
| Error handling | timeout, connection failure, retry exhausted | 2+ |
Rule: Happy path only = incomplete testing. Each AC needs edge cases.
QA Verdict Handling
QA Analyst returns VERDICT: PASS or VERDICT: FAIL.
PASS (coverage ≥ threshold) → Proceed to Gate 4 (Review)
FAIL (coverage < threshold) → Return to Gate 0 (Implementation)
On FAIL
- QA provides gap analysis (what needs tests)
- Dev-cycle returns to Gate 0 with this analysis
- Implementation agent adds the missing tests
- Gate 3 re-runs
- Max 3 attempts, then escalate to user
Iteration Tracking
The dev-cycle orchestrator tracks iteration count in state:
{
"testing": {
"iteration": 1,
"verdict": "FAIL",
"coverage_actual": 72.5,
"coverage_threshold": 85
}
}
Enforcement rules:
- Increment
iterationon each Gate 3 entry - After 3rd FAIL: STOP, do NOT return to Gate 0
- Present gap analysis to user for manual resolution
- User must decide: fix manually, adjust threshold (if allowed), or abort task
Handling Test Failures
If tests fail during execution:
- Do NOT proceed to Gate 3
- Identify root cause:
- Implementation bug -> Return to Gate 0
- Test bug -> Fix test, re-run TDD cycle
- Missing requirement -> Document gap, consult stakeholder
- Fix and re-run full suite
- Update matrix with resolution
Anti-Patterns
Never:
- Write tests after implementation without RED phase
- Mark tests as skipped to pass gate
- Assume passing tests mean criteria met (verify mapping)
- Copy-paste tests without understanding
- Mock the unit under test (only mock dependencies)
- Test implementation details instead of behavior
Always:
- Watch test fail before implementing (RED phase)
- Run full suite, not just new tests
- Verify test actually tests the criterion
- Use descriptive test names
- Test one behavior per test
- Clean up test state
Execution Report
Base metrics per shared-patterns/output-execution-report.md.
| Metric | Value |
|---|---|
| Duration | Xm Ys |
| Iterations | N |
| Unit Tests Written | X |
| Coverage | XX.X% |
| Criteria Covered | X/Y (100%) |
| Result | PASS/FAIL |
Recovery Procedures
| Issue | Recovery Steps |
|---|---|
| Tests without RED phase | Delete impl → Run test (must fail) → Restore impl (must pass) → If passes without impl: test is wrong, rewrite |
| Coverage below threshold | Identify uncovered lines → Write unit tests for gaps → Re-run coverage → Repeat until ≥85% |
| Criterion has no test | STOP → Write unit test → Full TDD cycle → Update matrix |