Claude Code Plugins

Community-maintained marketplace

Feedback

sre-task-refinement

@withzombies/hyperpowers
13
0

Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name sre-task-refinement
description Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.
Review bd task plans with Google Fellow SRE perspective to ensure junior engineer can execute without questions; catch edge cases, verify granularity, strengthen criteria, prevent production issues before implementation. LOW FREEDOM - Follow the 7-category checklist exactly. Apply all categories to every task. No skipping red flag checks. Always verify no placeholder text after updates. Reject plans with critical gaps. | Category | Key Questions | Auto-Reject If | |----------|---------------|----------------| | 1. Granularity | Tasks 4-8 hours? Phases <16 hours? | Any task >16h without breakdown | | 2. Implementability | Junior can execute without questions? | Vague language, missing details | | 3. Success Criteria | 3+ measurable criteria per task? | Can't verify ("works well") | | 4. Dependencies | Correct parent-child, blocking relationships? | Circular dependencies | | 5. Safety Standards | Anti-patterns specified? Error handling? | No anti-patterns section | | 6. Edge Cases | Empty input? Unicode? Concurrency? Failures? | No edge case consideration | | 7. Red Flags | Placeholder text? Vague instructions? | "[detailed above]", "TODO" |

Perspective: Google Fellow SRE with 20+ years experience reviewing junior engineer designs.

Time: Don't rush - catching one gap pre-implementation saves hours of rework.

Use when: - Reviewing bd epic/feature plans before implementation - Need to ensure junior engineer can execute without questions - Want to catch edge cases and failure modes upfront - Need to verify task granularity (4-8 hour subtasks) - After hyperpowers:writing-plans creates initial plan - Before hyperpowers:executing-plans starts implementation

Don't use when:

  • Task already being implemented (too late)
  • Just need to understand existing code (use codebase-investigator)
  • Debugging issues (use debugging-with-tools)
  • Want to create plan from scratch (use brainstorming → writing-plans)
## Announcement

Announce: "I'm using hyperpowers:sre-task-refinement to review this plan with Google Fellow-level scrutiny."


Review Checklist (Apply to Every Task)

1. Task Granularity

Check:

  • No task >8 hours (subtasks) or >16 hours (phases)?
  • Large phases broken into 4-8 hour subtasks?
  • Each subtask independently completable?
  • Each subtask has clear deliverable?

If task >16 hours:

  • Create subtasks with bd create
  • Link with bd dep add child parent --type parent-child
  • Update parent to coordinator role

2. Implementability (Junior Engineer Test)

Check:

  • Can junior engineer implement without asking questions?
  • Function signatures/behaviors described, not just "implement X"?
  • Test scenarios described (what they verify, not just names)?
  • "Done" clearly defined with verifiable criteria?
  • All file paths specified or marked "TBD: new file"?

Red flags:

  • "Implement properly" (how?)
  • "Add support" (for what exactly?)
  • "Make it work" (what does working mean?)
  • File paths missing or ambiguous

3. Success Criteria Quality

Check:

  • Each task has 3+ specific, measurable success criteria?
  • All criteria testable/verifiable (not subjective)?
  • Includes automated verification (tests pass, clippy clean)?
  • No vague criteria like "works well" or "is implemented"?

Good criteria examples:

  • ✅ "5+ unit tests pass (valid VIN, invalid checksum, various formats)"
  • ✅ "Clippy clean with no warnings"
  • ✅ "Performance: <100ms for 1000 records"

Bad criteria examples:

  • ❌ "Code is good quality"
  • ❌ "Works correctly"
  • ❌ "Is implemented"

4. Dependency Structure

Check:

  • Parent-child relationships correct (epic → phases → subtasks)?
  • Blocking dependencies correct (earlier work blocks later)?
  • No circular dependencies?
  • Dependency graph makes logical sense?

Verify with:

bd dep tree bd-1  # Show full dependency tree

5. Safety & Quality Standards

Check:

  • Anti-patterns include unwrap/expect prohibition?
  • Anti-patterns include TODO prohibition (or must have issue #)?
  • Anti-patterns include stub implementation prohibition?
  • Error handling requirements specified (use Result, avoid panic)?
  • Test requirements specific (test names, scenarios listed)?

Minimum anti-patterns:

  • ❌ No unwrap/expect in production code
  • ❌ No TODOs without issue numbers
  • ❌ No stub implementations (unimplemented!, todo!)
  • ❌ No regex without catastrophic backtracking check

6. Edge Cases & Failure Modes (Fellow SRE Perspective)

Ask for each task:

  • What happens with malformed input?
  • What happens with empty/nil/zero values?
  • What happens under high load/concurrency?
  • What happens when dependencies fail?
  • What happens with Unicode, special characters, large inputs?
  • Are these edge cases addressed in the plan?

Add to Key Considerations section:

  • Edge case descriptions
  • Mitigation strategies
  • References to similar code handling these cases

7. Red Flags (AUTO-REJECT)

Check for these - if found, REJECT plan:

  • ❌ Any task >16 hours without subtask breakdown
  • ❌ Vague language: "implement properly", "add support", "make it work"
  • ❌ Success criteria that can't be verified: "code is good", "works well"
  • ❌ Missing test specifications
  • ❌ "We'll handle this later" or "TODO" in the plan itself
  • ❌ No anti-patterns section
  • ❌ Implementation checklist with fewer than 3 items per task
  • ❌ No effort estimates
  • ❌ Missing error handling considerations
  • CRITICAL: Placeholder text in design field - "[detailed above]", "[as specified]", "[complete steps here]"

Review Process

For each task in the plan:

Step 1: Read the task

bd show bd-3

Step 2: Apply all 7 checklist categories

  • Task Granularity
  • Implementability
  • Success Criteria Quality
  • Dependency Structure
  • Safety & Quality Standards
  • Edge Cases & Failure Modes
  • Red Flags

Step 3: Document findings Take notes:

  • What's done well
  • What's missing
  • What's vague or ambiguous
  • Hidden failure modes not addressed
  • Better approaches or simplifications

Step 4: Update the task

Use bd update to add missing information:

bd update bd-3 --design "$(cat <<'EOF'
## Goal
[Original goal, preserved]

## Effort Estimate
[Updated estimate if needed]

## Success Criteria
- [ ] Existing criteria
- [ ] NEW: Added missing measurable criteria

## Implementation Checklist
[Complete checklist with file paths]

## Key Considerations (ADDED BY SRE REVIEW)

**Edge Case: Empty Input**
- What happens when input is empty string?
- MUST validate input length before processing

**Edge Case: Unicode Handling**
- What if string contains RTL or surrogate pairs?
- Use proper Unicode-aware string methods

**Performance Concern: Regex Backtracking**
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
- MUST test with pathological inputs (e.g., 10000 'a's)
- Use possessive quantifiers or bounded repetition

**Reference Implementation**
- Study src/similar/module.rs for pattern to follow

## Anti-patterns
[Original anti-patterns]
- ❌ NEW: Specific anti-pattern for this task's risks
EOF
)"

IMPORTANT: Use --design for full detailed description, NOT --description (title only).

Step 5: Verify no placeholder text (MANDATORY)

After updating, read back with bd show bd-N and verify:

  • ✅ All sections contain actual content, not meta-references
  • ✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
  • ✅ Implementation steps fully written with actual code examples
  • ✅ Success criteria explicit, not referencing "criteria above"
  • ❌ If ANY placeholder text found: REJECT and rewrite with actual content

Breaking Down Large Tasks

If task >16 hours, create subtasks:

# Create first subtask
bd create "Subtask 1: [Specific Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design with all 7 categories addressed]"
# Returns bd-10

# Create second subtask
bd create "Subtask 2: [Another Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design]"
# Returns bd-11

# Link subtasks to parent with parent-child relationship
bd dep add bd-10 bd-3 --type parent-child  # bd-10 is child of bd-3
bd dep add bd-11 bd-3 --type parent-child  # bd-11 is child of bd-3

# Add sequential dependencies if needed (LATER depends on EARLIER)
bd dep add bd-11 bd-10  # bd-11 depends on bd-10 (do bd-10 first)

# Update parent to coordinator
bd update bd-3 --design "$(cat <<'EOF'
## Goal
Coordinate implementation of [feature]. Broken into N subtasks.

## Success Criteria
- [ ] All N child subtasks closed
- [ ] Integration tests pass
- [ ] [High-level verification criteria]
EOF
)"

Output Format

After reviewing all tasks:

## Plan Review Results

### Epic: [Name] ([epic-id])

### Overall Assessment
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]

### Dependency Structure Review
[Output of `bd dep tree [epic-id]`]

**Structure Quality**: [✅ Correct / ❌ Issues found]
- [Comments on parent-child relationships]
- [Comments on blocking dependencies]
- [Comments on granularity]

### Task-by-Task Review

#### [Task Name] (bd-N)
**Type**: [epic/feature/task]
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])

**Strengths**:
- [What's done well]

**Critical Issues** (must fix):
- [Blocking problems]

**Improvements Needed**:
- [What to add/clarify]

**Edge Cases Missing**:
- [Failure modes not addressed]

**Changes Made**:
- [Specific improvements added via `bd update`]

---

[Repeat for each task/phase/subtask]

### Summary of Changes

**Issues Updated**:
- bd-3 - Added edge case handling for Unicode, regex backtracking risks
- bd-5 - Broke into 3 subtasks (was 40 hours, now 3x8 hours)
- bd-7 - Strengthened success criteria (added test names, verification commands)

### Critical Gaps Across Plan
1. [Pattern of missing items across multiple tasks]
2. [Systemic issues in the plan]

### Recommendations

[If APPROVE]:
✅ Plan is solid and ready for implementation.
- All tasks are junior-engineer implementable
- Dependency structure is correct
- Edge cases and failure modes addressed

[If NEEDS REVISION]:
⚠️ Plan needs improvements before implementation:
- [List major items that need addressing]
- After changes, re-run hyperpowers:sre-task-refinement

[If REJECT]:
❌ Plan has fundamental issues and needs redesign:
- [Critical problems]
Developer reviews task but skips edge case analysis (Category 6) # Review of bd-3: Implement VIN scanner

Checklist review:

  1. Granularity: ✅ 6-8 hours
  2. Implementability: ✅ Junior can implement
  3. Success Criteria: ✅ Has 5 test scenarios
  4. Dependencies: ✅ Correct
  5. Safety Standards: ✅ Anti-patterns present
  6. Edge Cases: [SKIPPED - "looks straightforward"]
  7. Red Flags: ✅ None found

Conclusion: "Task looks good, approve ✅"

Task ships without edge case review

Production issues occur:

  • VIN scanner matches random 17-char strings (no checksum validation)
  • Lowercase VINs not handled (should normalize)
  • Catastrophic regex backtracking on long inputs (DoS vulnerability)
- Skipped Category 6 (Edge Cases) assuming task was "straightforward" - Didn't ask: What happens with invalid checksums? Lowercase? Long inputs? - Missed critical production issues: - False positives (no checksum validation) - Data handling bugs (case sensitivity) - Security vulnerability (regex DoS) - Junior engineer didn't know to handle these (not in task) - Production incidents occur after deployment - Hours of emergency fixes, customer impact - SRE review failed to prevent known failure modes **Apply Category 6 rigorously:**
## Edge Case Analysis for bd-3: VIN Scanner

Ask for EVERY task:
- Malformed input? VIN has checksum - must validate, not just pattern match
- Empty/nil? What if empty string passed?
- Concurrency? Read-only scanner, no concurrency issues
- Dependency failures? No external dependencies
- Unicode/special chars? VIN is alphanumeric only, but what about lowercase?
- Large inputs? Regex `.*` patterns can cause catastrophic backtracking

Findings:
❌ VIN checksum validation not mentioned (will match random strings)
❌ Case normalization not mentioned (lowercase VINs exist)
❌ Regex backtracking risk not mentioned (DoS vulnerability)

Update task:

bd update bd-3 --design "$(cat <<'EOF'
[... original content ...]

## Key Considerations (ADDED BY SRE REVIEW)

**VIN Checksum Complexity**:
- ISO 3779 requires transliteration table (letters → numbers)
- Weighted sum algorithm with modulo 11
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
- MUST validate checksum, not just pattern - prevents false positives

**Case Normalization**:
- VINs can appear in lowercase
- MUST normalize to uppercase before validation
- Test with mixed case: "1hgbh41jxmn109186"

**Regex Backtracking Risk**:
- CRITICAL: Pattern `.*[A-HJ-NPR-Z0-9]{17}.*` has backtracking risk
- Test with pathological input: 10000 'X's followed by 16-char string
- Use possessive quantifiers or bounded repetition
- Reference: https://www.regular-expressions.info/catastrophic.html

**Edge Cases to Test**:
- Valid VIN with valid checksum (should match)
- Valid pattern but invalid checksum (should NOT match)
- Lowercase VIN (should normalize and validate)
- Ambiguous chars I/O not valid in VIN (should reject)
- Very long input (should not DoS)
EOF
)"

What you gain:

  • Prevented false positives (checksum validation)
  • Prevented data handling bugs (case normalization)
  • Prevented security vulnerability (regex DoS)
  • Junior engineer has complete requirements
  • Production issues caught pre-implementation
  • Proper SRE review preventing known failure modes
  • Customer trust maintained
Developer approves task with placeholder text (Red Flag #10) # Review of bd-5: Implement License Plate Scanner

bd show bd-5:

Implementation Checklist

  • Create scanner module
  • [Complete implementation steps detailed above]
  • Add tests

Success Criteria

  • [As specified in the implementation checklist]
  • Tests pass

Key Considerations

  • [Will be added during implementation]

Developer's review:

"Looks comprehensive, has implementation checklist and success criteria ✅"

During implementation:

Junior engineer: "What are the 'implementation steps detailed above'?" Junior engineer: "What specific success criteria should I verify?" Junior engineer: "What key considerations exist?"

No answers in the task - junior engineer blocked

Have to research and add missing information

Implementation delayed by 2 days

- Missed Red Flag #10: Placeholder text present - "[Complete implementation steps detailed above]" is meta-reference, not content - "[As specified in the implementation checklist]" is circular reference - "[Will be added during implementation]" is deferral, not specification - Junior engineer can't execute - missing critical information - Task looks complete but actually incomplete - Implementation blocked until details added - SRE review failed to catch placeholder text **Check for placeholder text after reading:**
## Red Flag Check (Category 7)

Read through bd-5 line by line:

Line 15: "[Complete implementation steps detailed above]"
❌ PLACEHOLDER - "detailed above" is meta-reference, not actual content

Line 22: "[As specified in the implementation checklist]"
❌ PLACEHOLDER - Circular reference to another section, not explicit criteria

Line 30: "[Will be added during implementation]"
❌ PLACEHOLDER - Deferral to future, not actual considerations

DECISION: REJECT ❌
Reason: Contains placeholder text - task not ready for implementation

Update task with actual content:

bd update bd-5 --design "$(cat <<'EOF'
## Implementation Checklist
- [ ] Create src/scan/plugins/scanners/license_plate.rs
- [ ] Implement LicensePlateScanner struct with ScanPlugin trait
- [ ] Add regex patterns for US states:
  - CA: `[0-9][A-Z]{3}[0-9]{3}` (e.g., 1ABC123)
  - NY: `[A-Z]{3}[0-9]{4}` (e.g., ABC1234)
  - TX: `[A-Z]{3}[0-9]{4}|[0-9]{3}[A-Z]{3}` (e.g., ABC1234 or 123ABC)
  - Generic: `[A-Z0-9]{5,8}` (fallback)
- [ ] Implement has_healthcare_context() check
- [ ] Create test module with 8+ test cases
- [ ] Register in src/scan/plugins/scanners/mod.rs

## Success Criteria
- [ ] Valid CA plate "1ABC123" detected in healthcare context
- [ ] Valid NY plate "ABC1234" detected in healthcare context
- [ ] Invalid plate "123" NOT detected (too short)
- [ ] Valid plate NOT detected outside healthcare context
- [ ] 8+ unit tests pass covering all patterns and edge cases
- [ ] Clippy clean, no warnings
- [ ] cargo test passes

## Key Considerations

**False Positive Risk**:
- License plates are short and generic (5-8 chars)
- MUST require healthcare context via has_healthcare_context()
- Without context, will match random alphanumeric sequences
- Test: Random string "ABC1234" should NOT match outside healthcare context

**State Format Variations**:
- 50 US states have different formats
- Implement common formats (CA, NY, TX) + generic fallback
- Document which formats supported in module docstring
- Consider international plates in future iteration

**Performance**:
- Regex patterns are simple, no backtracking risk
- Should process <1ms per chunk

**Reference Implementation**:
- Study src/scan/plugins/scanners/vehicle_identifier.rs
- Follow same pattern: regex + context check + tests
EOF
)"

Verify no placeholder text:

bd show bd-5
# Read entire output
# Confirm: All sections have actual content
# Confirm: No "[detailed above]", "[as specified]", "[will be added]"
# ✅ Task ready for implementation

What you gain:

  • Junior engineer has complete specification
  • No blocked implementation waiting for details
  • All edge cases documented upfront
  • Success criteria explicit and verifiable
  • Key considerations prevent common mistakes
  • No placeholder text - task truly ready
  • Professional SRE review standard maintained
Developer accepts vague success criteria (Category 3) # Review of bd-7: Implement Data Encryption

bd show bd-7:

Success Criteria

  • Encryption is implemented correctly
  • Code is good quality
  • Tests work properly

Developer's review:

"Has 3 success criteria ✅ Meets minimum requirement"

During implementation:

Junior engineer: "How do I know if encryption is 'correct'?" Junior engineer: "What makes code 'good quality'?" Junior engineer: "What does 'tests work properly' mean?"

Junior engineer makes best guesses:

  • Uses ECB mode (insecure, should use GCM)
  • No key rotation (bad practice)
  • Tests only happy path (misses edge cases)

Code review finds critical security issues

Complete rewrite required - 3 days wasted

- Category 3 check: "All criteria testable/verifiable?" ❌ FAILED - "Implemented correctly" - not measurable (correct by what standard?) - "Good quality" - subjective, not verifiable - "Work properly" - vague, what is proper? - Junior engineer can't verify criteria objectively - Makes incorrect assumptions filling gaps - Security vulnerabilities introduced - Wastes time on implementation that fails review - SRE review failed to strengthen criteria **Apply Category 3 rigorously:**
## Success Criteria Analysis for bd-7

Current criteria:
- [ ] Encryption is implemented correctly
  ❌ NOT TESTABLE - "correctly" is subjective, no standard specified

- [ ] Code is good quality
  ❌ NOT TESTABLE - "good quality" is opinion, not measurable

- [ ] Tests work properly
  ❌ NOT TESTABLE - "properly" is vague, no definition

Minimum requirement: 3+ specific, measurable, testable criteria
Current: 0 testable criteria
DECISION: REJECT ❌

Update with measurable criteria:

bd update bd-7 --design "$(cat <<'EOF'
[... original content ...]

## Success Criteria

**Encryption Implementation**:
- [ ] Uses AES-256-GCM mode (verified in code review)
- [ ] Key derivation via PBKDF2 with 100,000 iterations (NIST recommendation)
- [ ] Unique IV generated per encryption (crypto_random)
- [ ] Authentication tag verified on decryption

**Code Quality** (automated checks):
- [ ] Clippy clean with no warnings: `cargo clippy -- -D warnings`
- [ ] Rustfmt compliant: `cargo fmt --check`
- [ ] No unwrap/expect in production: `rg "\.unwrap\(\)|\.expect\(" src/` returns 0
- [ ] No TODOs without issue numbers: `rg "TODO" src/` returns 0

**Test Coverage**:
- [ ] 12+ unit tests pass covering:
  - test_encrypt_decrypt_roundtrip (happy path)
  - test_wrong_key_fails_auth (security)
  - test_modified_ciphertext_fails_auth (security)
  - test_empty_plaintext (edge case)
  - test_large_plaintext_10mb (performance)
  - test_unicode_plaintext (data handling)
  - test_concurrent_encryption (thread safety)
  - test_iv_uniqueness (security)
  - [4 more specific scenarios]
- [ ] All tests pass: `cargo test encryption`
- [ ] Test coverage >90%: `cargo tarpaulin --packages encryption`

**Documentation**:
- [ ] Module docstring explains encryption scheme (AES-256-GCM)
- [ ] Function docstrings include examples
- [ ] Security considerations documented (key management, IV handling)

**Security Review**:
- [ ] No hardcoded keys or IVs (verified via grep)
- [ ] Key zeroized after use (verified in code)
- [ ] Constant-time comparison for auth tag (timing attack prevention)
EOF
)"

What you gain:

  • Every criterion objectively verifiable
  • Junior engineer knows exactly what "done" means
  • Automated checks (clippy, fmt, grep) provide instant feedback
  • Specific test scenarios prevent missed edge cases
  • Security requirements explicit (GCM, PBKDF2, unique IV)
  • No ambiguity - can verify each criterion with command or code review
  • Professional SRE review standard: measurable, testable, specific
## Rules That Have No Exceptions
  1. Apply all 7 categories to every task → No skipping any category for any task
  2. Reject plans with placeholder text → "[detailed above]", "[as specified]" = instant reject
  3. Verify no placeholder after updates → Read back with bd show and confirm actual content
  4. Break tasks >16 hours → Create subtasks, don't accept large tasks
  5. Strengthen vague criteria → "Works correctly" → measurable verification commands
  6. Add edge cases to every task → Empty? Unicode? Concurrency? Failures?
  7. Never skip Category 6 → Edge case analysis prevents production issues

Common Excuses

All of these mean: STOP. Apply the full process.

  • "Task looks straightforward" (Edge cases hide in "straightforward" tasks)
  • "Has 3 criteria, meets minimum" (Criteria must be measurable, not just 3+ items)
  • "Placeholder text is just formatting" (Placeholders mean incomplete specification)
  • "Can handle edge cases during implementation" (Must specify upfront, not defer)
  • "Junior will figure it out" (Junior should NOT need to figure out - we specify)
  • "Too detailed, feels like micromanaging" (Detail prevents questions and rework)
  • "Taking too long to review" (One gap caught saves hours of rework)
Before completing SRE review:

Per task reviewed:

  • Applied all 7 categories (Granularity, Implementability, Criteria, Dependencies, Safety, Edge Cases, Red Flags)
  • Checked for placeholder text in design field
  • Updated task with missing information via bd update --design
  • Verified updated task with bd show (no placeholders remain)
  • Broke down any task >16 hours into subtasks
  • Strengthened vague success criteria to measurable
  • Added edge case analysis to Key Considerations
  • Strengthened anti-patterns based on failure modes

Overall plan:

  • Reviewed ALL tasks/phases/subtasks (no exceptions)
  • Verified dependency structure with bd dep tree
  • Documented findings for each task
  • Created summary of changes made
  • Provided clear recommendation (APPROVE/NEEDS REVISION/REJECT)

Can't check all boxes? Return to review process and complete missing steps.

**This skill is used after:** - hyperpowers:writing-plans (creates initial plan) - hyperpowers:brainstorming (establishes requirements)

This skill is used before:

  • hyperpowers:executing-plans (implements tasks)

This skill is also called by:

  • hyperpowers:executing-plans (REQUIRED for new tasks created during execution)

Call chains:

Initial planning:
hyperpowers:brainstorming → hyperpowers:writing-plans → hyperpowers:sre-task-refinement → hyperpowers:executing-plans
                                                    ↓
                                            (if gaps: revise and re-review)

During execution (for new tasks):
hyperpowers:executing-plans → creates new task → hyperpowers:sre-task-refinement → STOP checkpoint

This skill uses:

  • bd commands (show, update, create, dep add, dep tree)
  • Google Fellow SRE perspective (20+ years distributed systems)
  • 7-category checklist (mandatory for every task)

Time expectations:

  • Small epic (3-5 tasks): 15-20 minutes
  • Medium epic (6-10 tasks): 25-40 minutes
  • Large epic (10+ tasks): 45-60 minutes

Don't rush: Catching one critical gap pre-implementation saves hours of rework.

**Review patterns:** - Task too large (>16h) → Break into 4-8h subtasks - Vague criteria ("works correctly") → Measurable commands/checks - Missing edge cases → Add to Key Considerations with mitigations - Placeholder text → Rewrite with actual content

When stuck:

  • Unsure if task too large → Ask: Can junior complete in one day?
  • Unsure if criteria measurable → Ask: Can I verify with command/code review?
  • Unsure if edge case matters → Ask: Could this fail in production?
  • Unsure if placeholder → Ask: Does this reference other content instead of providing content?

Key principle: Junior engineer should be able to execute task without asking questions. If they would need to ask, specification is incomplete.