| name | review-implementation |
| description | Use after hyperpowers:executing-plans completes all tasks to review implementation against bd spec - verifies all success criteria met, anti-patterns avoided, and nothing missed before declaring work complete |
Review Implementation
Overview
Review completed implementation against bd epic specification to catch gaps before claiming completion.
Review Perspective: You are a Google Fellow-level SRE with 20+ years of production experience reviewing code written by a junior engineer. Apply rigorous production standards.
Core principle: Implementation must match spec. Success criteria are the contract.
Announce at start: "I'm using the hyperpowers:review-implementation skill to verify the implementation matches the spec. I'm reviewing this with Google Fellow-level scrutiny."
Context: This runs after hyperpowers:executing-plans completes all tasks but before hyperpowers:finishing-a-development-branch.
CRITICAL: NEVER read .beads/issues.jsonl directly. ALWAYS use bd show, bd list, and bd dep tree commands to read task specifications. The bd CLI provides the correct interface.
The Process
Step 1: Load Epic Specification from bd
Get the epic and all tasks:
# Show epic
bd show bd-1
# Get all tasks in epic
bd dep tree bd-1
# List all tasks for detailed review
bd list --parent bd-1
Create TodoWrite tracker with all task IDs to review.
Step 2: Review Each Task Against Implementation
For each task in the epic:
A. Read task specification:
bd show bd-3
B. Identify what was supposed to be delivered:
- Goal (what problem does this solve?)
- Success criteria (how do we know it's done?)
- Implementation checklist (what files/functions/tests?)
- Key considerations (what edge cases matter?)
- Anti-patterns (what must be avoided?)
C. Review actual implementation:
1. Run Automated Code Completeness Checks:
# Check for TODOs/FIXMEs without issue numbers
echo "🔍 Checking for TODOs/FIXMEs..."
rg -i "todo|fixme" src/ tests/ || echo "✅ No TODOs/FIXMEs found"
# Check for stub implementations
echo "🔧 Checking for stub implementations..."
rg "unimplemented!|todo!|unreachable!|panic!\(\"not implemented" src/ || echo "✅ No stubs found"
# Check for unsafe error handling in production code
echo "⚠️ Checking for unsafe patterns in production..."
rg "\.unwrap\(\)|\.expect\(" src/ | grep -v "/tests/" || echo "✅ No unsafe patterns in production"
# Check for ignored/skipped tests
echo "🚫 Checking for ignored/skipped tests..."
rg "#\[ignore\]|#\[skip\]|\.skip\(\)" tests/ src/ || echo "✅ No ignored tests"
2. Run Quality Gate Checks:
Dispatch hyperpowers:test-runner agent for each quality gate:
# Run tests via agent
hyperpowers:test-runner: "Run all tests: cargo test"
# Run formatter check via agent
hyperpowers:test-runner: "Check formatting: cargo fmt --check"
# Run linter via agent
hyperpowers:test-runner: "Run linter: cargo clippy -- -D warnings"
# Run pre-commit hook via agent
hyperpowers:test-runner: "Run pre-commit: .git/hooks/pre-commit"
3. READ Implementation Files:
CRITICAL: You MUST read the actual code, not just check git diff.
For each file in the implementation checklist:
# First see what changed
git diff main...HEAD -- src/auth/jwt.ts
# THEN READ THE ACTUAL FILE
# Use Read tool to read: src/auth/jwt.ts
While reading, check:
- ✅ Is the code actually implementing what the checklist says?
- ✅ Are functions/methods actually complete (not stubs)?
- ✅ Does error handling use proper patterns (Result, try/catch)?
- ✅ Are edge cases from "Key Considerations" handled in the code?
- ✅ Is the code clear and maintainable?
- ✅ Are there any anti-patterns present?
4. Code Quality Review (Google Fellow Perspective):
CRITICAL: Assume this code was written by a junior engineer. Review with production-grade scrutiny.
For each implementation file, assess with Google Fellow standards:
Error Handling:
- Proper use of Result/Option types (Rust) or try/catch (JS/TS)?
- Error messages helpful for debugging production issues?
- No unwrap/expect in production code?
- Errors propagate correctly with context?
- Failure modes handled gracefully?
Safety:
- No unsafe code blocks without justification and thorough comments?
- Proper bounds checking on arrays/slices?
- No potential panics or crashes?
- No data races or concurrency issues?
- No SQL injection, XSS, or security vulnerabilities?
Clarity:
- Would a junior engineer understand this code in 6 months?
- Functions have single responsibility?
- Variable names descriptive and unambiguous?
- Complex logic explained with comments?
- No clever tricks - is the code obvious and boring?
Testing:
- Edge cases covered (empty input, max values, Unicode, etc.)?
- Tests are meaningful, not just for coverage numbers?
- Test names clearly describe what they verify?
- Tests actually test behavior, not implementation details?
- Failure scenarios tested (error paths)?
Production Readiness:
- Would you be comfortable deploying this to production?
- Could this cause an outage or data loss?
- Is performance acceptable under load?
- Are there obvious optimization opportunities missed?
- Logging/observability sufficient for debugging production issues?
5. Verify Success Criteria with Evidence:
For each success criterion:
- Run verification commands
- Check actual output
- Don't assume - verify with evidence
- Use hyperpowers:test-runner agent for tests/lints/builds
6. Check Anti-Patterns:
Search for each prohibited pattern from bd task:
- Unwrap/expect in production
- TODOs without issue numbers
- Stub implementations
- Ignored tests without justification
- Task-specific anti-patterns
7. Verify Key Considerations:
Read code to confirm edge cases were handled:
- Empty input validation
- Unicode handling
- Concurrent access
- Failure modes
- Performance concerns
D. Record findings for this task:
### Task: [Name] (bd-N)
#### Automated Checks
- TODOs/FIXMEs: [✅ None / ❌ Found at: file:line]
- Stubs: [✅ None / ❌ Found at: file:line]
- Unsafe patterns: [✅ None / ❌ Found at: file:line]
- Ignored tests: [✅ None / ❌ Found at: file:line]
#### Quality Gates
- Tests: [✅ Pass (N tests) / ❌ Fail (N failures)]
- Formatting: [✅ Pass / ❌ Fail]
- Linting: [✅ Pass / ❌ Warnings found]
- Pre-commit: [✅ Pass / ❌ Fail]
#### Files Reviewed
- src/file1.rs: [✅ Implements checklist / ❌ Issues: ...]
- src/file2.rs: [✅ Implements checklist / ❌ Issues: ...]
- tests/file1_test.rs: [✅ Complete / ❌ Issues: ...]
#### Code Quality (Google Fellow Review)
- Error Handling: [✅ Good / ⚠️ Concerns: ... / ❌ Issues: ...]
- Safety: [✅ Good / ⚠️ Concerns: ... / ❌ Issues: ...]
- Clarity: [✅ Good / ⚠️ Concerns: ... / ❌ Issues: ...]
- Testing: [✅ Good / ⚠️ Concerns: ... / ❌ Issues: ...]
#### Success Criteria (from bd issue)
1. [Criterion 1]: [✅ Met / ❌ Not met] - Evidence: [how verified]
2. [Criterion 2]: [✅ Met / ❌ Not met] - Evidence: [how verified]
#### Anti-Patterns Check
- [Anti-pattern 1]: [✅ Avoided / ❌ Violated at: file:line]
- [Anti-pattern 2]: [✅ Avoided / ❌ Violated at: file:line]
#### Key Considerations Check
- [Edge case 1]: [✅ Handled at file:line / ❌ Not addressed]
- [Edge case 2]: [✅ Handled at file:line / ❌ Not addressed]
#### Issues Found
**Critical** (must fix):
1. [Issue] - file:line - [Why critical]
**Important** (should fix):
1. [Issue] - file:line - [Impact]
**Minor** (nice to have):
1. [Issue] - file:line - [Suggestion]
E. Mark task review as completed in TodoWrite
Step 3: Report Findings
After reviewing all tasks, compile findings:
If NO gaps found:
## Implementation Review: APPROVED ✅
Reviewed bd-1 (<epic name>) against implementation.
### Tasks Reviewed
- bd-2: <Task Name> ✅
- bd-3: <Task Name> ✅
- bd-4: <Task Name> ✅
- bd-5: <Task Name> ✅
### Verification Summary
- All success criteria verified
- No anti-patterns detected
- All key considerations addressed
- All files implemented per spec
### Evidence
[Show key verification command outputs]
Ready to proceed to hyperpowers:finishing-a-development-branch.
If gaps found:
## Implementation Review: GAPS FOUND ❌
Reviewed bd-1 (<epic name>) against implementation.
### Tasks with Gaps
#### bd-3: <Task Name>
**Gaps:**
- ❌ Success criterion not met: "Pre-commit hooks pass"
- Evidence: `cargo clippy` shows 3 warnings
- ❌ Anti-pattern violation: Found `unwrap()` in src/auth/jwt.ts:45
- ⚠️ Key consideration not addressed: "Empty input validation"
- No check for empty payload in generateToken()
#### bd-5: <Task Name>
**Gaps:**
- ❌ Success criterion not met: "All tests passing"
- Evidence: test_verify_expired_token is failing
### Cannot Proceed
Implementation does not match spec. Fix gaps before completing.
Step 4: Gate Decision
If APPROVED:
- Announce: "I'm using the hyperpowers:finishing-a-development-branch skill to complete this work."
- REQUIRED: Use Skill tool to invoke:
hyperpowers:finishing-a-development-branch
If GAPS FOUND:
- STOP. Do not proceed to hyperpowers:finishing-a-development-branch
- Fix gaps or discuss with partner
- Re-run review after fixes
Review Checklist (per task)
For each task, verify:
Success Criteria
- Every success criterion listed in bd task is verifiable
- Each criterion has been verified with evidence (command output)
- No criteria were skipped or assumed
Implementation Checklist
- Every file in checklist was created/modified
- Every function in checklist exists and is implemented (not stubbed)
- Every test in checklist exists and passes
Anti-Patterns
- Searched for each prohibited pattern listed in task
- No violations found
- If violations found: documented as gap
Key Considerations
- Each edge case mentioned has corresponding code
- Error handling for failure modes exists
- Performance/security concerns addressed
Code Quality
- No TODOs without issue numbers
- No commented-out code
- No debug print statements
- Follows project style/conventions
Common Rationalizations - STOP
| Excuse | Reality |
|---|---|
| "Tests pass, must be complete" | Tests ≠ spec. Check every success criterion. |
| "I implemented it, it's done" | Implementation ≠ spec compliance. Review evidence. |
| "No time for thorough review" | Gaps found later cost more time than review now. |
| "Looks good to me" | Your opinion ≠ evidence. Run verifications. |
| "Small gaps don't matter" | Spec is contract. All criteria must be met. |
| "Will fix in next PR" | This PR should complete this epic. Fix now. |
| "Partner will review" | You review first. Don't delegate your quality check. |
| "Can check git diff instead of reading files" | NO. Git diff shows changes, not full context. READ the actual files. |
| "Automated checks cover quality" | NO. Automated checks + code review both required. Read the code. |
| "Success criteria passing means done" | NO. Also check: anti-patterns, code quality, edge cases. Read the code. |
Red Flags - STOP
Never:
- Skip reviewing a task "because it's simple"
- Approve without verifying every success criterion
- Ignore anti-pattern violations
- Assume key considerations were addressed
- Trust that "tests passing" means spec is met
- Only check git diff without reading actual files
- Skip automated checks (TODOs, stubs, unwrap)
- Skip code quality review (error handling, safety, clarity)
Always:
- Review every task, no exceptions
- Verify with commands and evidence
- Document gaps explicitly
- Check for anti-patterns from hyperpowers:sre-task-refinement
- READ the actual implementation files with Read tool
- Run automated checks for TODOs, stubs, unsafe patterns
- Run quality gates with hyperpowers:test-runner agent
- Assess code quality with Google Fellow perspective
Working with bd
Reading Task Specifications
# Show task with full design
bd show bd-3
# The design contains:
# - Goal
# - Success Criteria (your checklist)
# - Implementation Steps (what you executed)
# - Key Considerations (edge cases to check)
# - Anti-patterns (what to search for)
Verification Pattern
For each task:
- Read success criteria from bd:
bd show bd-3 | grep -A 20 "Success Criteria"
- Run each verification:
For test commands (use hyperpowers:test-runner agent):
- Dispatch hyperpowers:test-runner agent with: "Run: cargo test"
- Dispatch hyperpowers:test-runner agent with: "Run: cargo clippy"
For search/analysis commands (use Bash/Grep):
# Example criteria: "No unwrap in production code"
rg "\.unwrap\(\)" src/
- Record results:
- ✅ If verification passes
- ❌ If verification fails (document what failed)
Integration
Called by:
- hyperpowers:executing-plans (Step 5) - After all tasks executed, before finishing
Calls:
- hyperpowers:finishing-a-development-branch - If review approves, hand off to finish
Uses:
- hyperpowers:verification-before-completion - All verifications follow its "evidence before claims" principle; no completion claims without fresh verification evidence
Order:
hyperpowers:executing-plans → hyperpowers:review-implementation → hyperpowers:finishing-a-development-branch
↓
(if gaps found: STOP)
Why This Matters
Common scenarios this prevents:
Incomplete implementation:
- Developer executes 80% of task
- Tests pass for implemented parts
- Success criteria includes the missing 20%
- Review catches gap
Anti-pattern violations:
- Code works but uses prohibited patterns
- Would fail code review later
- Wastes PR review time
- Review catches early
Edge cases missed:
- hyperpowers:sre-task-refinement identified edge cases
- Developer focused on happy path
- Edge cases untested
- Review catches before PR
Spec drift:
- Implementation evolved during development
- Original spec goals forgotten
- Result doesn't solve original problem
- Review catches misalignment
Remember
- Spec (bd tasks) is the contract
- Implementation must fulfill contract completely
- "Working code" ≠ "spec-compliant code"
- Evidence-based review, not opinion
- Every success criterion matters
- Better to find gaps now than in PR review