| name | sre-task-refinement |
| description | Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements. |
Task Refinement
Overview
Refine tasks into actionable plans for a junior engineer to execute with little or no supervision. The plan should guarentee junior engineer success.
Core principle: Ask yourself, have we covered all our bases? Do we fully understand the scope? Have we provided the junior engineer enough information to succeed?
Announce at start: "I'm using the sre task refinement skill to refine the tasks."
Review bd issues for a feature plan with the critical eye of a Google Fellow SRE reviewing a junior engineer's design. Your goal is to prevent production issues, ensure implementability, and catch blind spots.
Time Expectations
Estimated review time:
- Small epic (3-5 tasks): 15-20 minutes
- Medium epic (6-10 tasks): 25-40 minutes
- Large epic (10+ tasks): 45-60 minutes
Don't rush. Catching one critical gap before implementation saves hours of rework.
Review Philosophy
Perspective: You are a Google Fellow SRE with 20+ years of distributed systems experience.
What You're Looking For:
- Can a junior engineer implement this without asking questions?
- Are there hidden failure modes or edge cases?
- Is the task granularity right (4-8 hour subtasks)?
- Will this cause production issues?
- Are there better/simpler approaches?
Review Checklist
1. Task Granularity (CRITICAL)
- No task >8 hours (subtasks) or >16 hours (phases)?
- Large phases broken into 4-8 hour subtasks?
- Each subtask independently completable?
- Each subtask has clear deliverable?
2. Implementability (Junior Engineer Test)
- Can a junior engineer implement without asking questions?
- Are function signatures/behaviors described, not just "implement X"?
- Are test scenarios described (what they verify, not just names)?
- Is "done" clearly defined with verifiable criteria?
- Are all file paths specified or marked "TBD: new file"?
3. Success Criteria Quality
- Each task has 3+ specific, measurable success criteria?
- All criteria are testable/verifiable (not subjective)?
- Includes automated verification (tests pass, clippy clean, etc.)?
- No vague criteria like "works well" or "is implemented"?
4. Dependency Structure
- Parent-child relationships correct (epic → phases → subtasks)?
- Blocking dependencies correct (earlier work blocks later work)?
- No circular dependencies?
- Dependency graph makes logical sense?
5. Safety & Quality Standards
- Anti-patterns include unwrap/expect prohibition?
- Anti-patterns include TODO prohibition (or must have issue #)?
- Anti-patterns include stub implementation prohibition?
- Error handling requirements specified (use Result, avoid panic)?
- Test requirements specific (test names, scenarios listed)?
6. Edge Cases & Failure Modes (Fellow SRE Perspective)
- What happens with malformed input?
- What happens with empty/nil/zero values?
- What happens under high load/concurrency?
- What happens when dependencies fail?
- What happens with Unicode, special characters, large inputs?
- Are these edge cases addressed in the plan?
7. Red Flags (AUTO-REJECT if found)
- ❌ Any task >16 hours without subtask breakdown
- ❌ Vague language: "implement properly", "add support", "make it work"
- ❌ Success criteria that can't be verified: "code is good", "works well"
- ❌ Missing test specifications
- ❌ "We'll handle this later" or "TODO" in the plan itself
- ❌ No anti-patterns section
- ❌ Implementation checklist with fewer than 3 items per task
- ❌ No effort estimates
- ❌ Missing error handling considerations
- ❌ CRITICAL: Placeholder text in design field - Any reference like "[detailed above]", "[as specified]", "[complete steps here]" instead of actual content
Review Process
For each phase and subtask:
A. Read the Issue
B. Apply All 7 Checklist Categories
- Task Granularity
- Implementability
- Success Criteria Quality
- Dependency Structure
- Safety & Quality Standards
- Edge Cases & Failure Modes
- Red Flags
C. Document Findings Take notes on:
- What's done well
- What's missing
- What's vague or ambiguous
- Hidden failure modes not addressed
- Better approaches or simplifications
D. Update the Issue Add to the design:
- Missing edge cases in "Key Considerations" section
- Missing test scenarios
- Clarifications for ambiguous instructions
- Warnings about potential pitfalls (e.g., "CRITICAL: regex backtracking risk")
- References to similar code to study
- Strengthened anti-patterns based on failure modes
- Better effort estimates if original was off
E. Verify No Placeholder Text (MANDATORY)
After updating, read back the issue with bd show bd-N and verify:
- ✅ All sections contain actual content, not meta-references
- ✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
- ✅ Implementation steps are fully written out with actual code examples
- ✅ Success criteria are explicit, not referencing "criteria above"
- ❌ If ANY placeholder text found: REJECT and rewrite with actual content
Working with bd
Reading Issues
# Show single issue with full details
bd show bd-3
# List all open issues
bd list --status open
# Show dependency tree for an epic
bd dep tree bd-1
# Find tasks ready to work on (no blocking dependencies)
bd ready
Updating Issues
When you find gaps or need to add information, update the issue's design:
# Update issue design (detailed description)
bd update bd-3 --design "$(cat <<'EOF'
## Goal
[Complete updated design with all sections]
## Effort Estimate
[Revised estimate]
## Success Criteria
- [ ] Existing criteria
- [ ] NEW: Added missing criteria
## Implementation Checklist
[Complete checklist]
## Key Considerations (ADDED BY SRE REVIEW)
**Edge Case: Empty Input**
- What happens when input is empty string?
- MUST validate input length before processing
**Edge Case: Unicode Handling**
- What if string contains RTL or surrogate pairs?
- Use proper Unicode-aware string methods
**Performance Concern: Regex Backtracking**
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
- MUST test with pathological inputs (e.g., 10000 'a's)
- Use possessive quantifiers or bounded repetition
## Anti-patterns
[Existing anti-patterns]
EOF
)"
IMPORTANT: Use --design for the full detailed description, NOT --description (which is title only).
Breaking Down Large Tasks
If a task is >16 hours, create subtasks:
# Create first subtask
bd create "Subtask 1: [Specific Component]" \
--type task \
--priority 1 \
--design "[Complete subtask design]"
# Returns bd-10
# Create second subtask
bd create "Subtask 2: [Another Component]" \
--type task \
--priority 1 \
--design "[Complete subtask design]"
# Returns bd-11
# Link subtasks to parent with parent-child relationship
bd dep add bd-10 bd-3 --type parent-child # bd-10 is child of bd-3
bd dep add bd-11 bd-3 --type parent-child # bd-11 is child of bd-3
# Add sequential dependencies if needed (LATER depends on EARLIER)
bd dep add bd-11 bd-10 # bd-11 depends on bd-10 (do bd-10 first)
# Update parent to be coordinator
bd update bd-3 --design "$(cat <<'EOF'
## Goal
Coordinate implementation of [feature]. Broken into N subtasks.
## Success Criteria
- [ ] All N child subtasks closed
- [ ] Integration tests pass
- [ ] [High-level verification criteria]
EOF
)"
Verifying Changes
After making updates:
# View updated issue
bd show bd-3
# Verify dependency structure
bd dep tree bd-1
# Check what's ready to work on
bd ready
Output Format
After reviewing all issues, provide:
## Plan Review Results
### Epic: [Name] ([epic-id])
### Overall Assessment
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]
### Dependency Structure Review
[Output of `bd dep tree [epic-id]`]
**Structure Quality**: [✅ Correct / ❌ Issues found]
- [Comments on parent-child relationships]
- [Comments on blocking dependencies]
- [Comments on granularity]
### Task-by-Task Review
#### [Task Name]
**Type**: [epic/feature/task]
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])
**Strengths**:
- [What's done well]
**Critical Issues** (must fix):
- [Blocking problems]
**Improvements Needed**:
- [What to add/clarify]
**Edge Cases Missing**:
- [Failure modes not addressed]
**Changes Made**:
- [Specific improvements added to bd issue via `bd update`]
---
[Repeat for each task/phase/subtask]
### Summary of Changes
**Issues Updated**:
- [issue-id] - [summary of improvements]
- [issue-id] - [summary of improvements]
### Critical Gaps Across Plan
1. [Pattern of missing items across multiple tasks]
2. [Systemic issues in the plan]
### Recommendations
[If APPROVE]:
✅ Plan is solid and ready for implementation.
- All tasks are junior-engineer implementable
- Dependency structure is correct
- Edge cases and failure modes addressed
[If NEEDS REVISION]:
⚠️ Plan needs improvements before implementation:
- [List major items that need addressing]
- After changes, re-run hyperpowers:sre-task-refinement
[If REJECT]:
❌ Plan has fundamental issues and needs redesign:
- [Critical problems]
Example Review
Before Review: ❌ REJECT
Issue: dlp-3
Title: Phase 2: Add HIPAA Scanners
Status: open
Priority: P1
Type: feature
Description:
Implement 7 scanners for HIPAA compliance.
Success Criteria:
- Scanners work correctly
- Tests pass
Estimated: 50-70 hours
Critical Issues:
- Too Large: 50-70 hours is way beyond junior-engineer task size
- Vague: "Scanners work correctly" not measurable
- No Breakdown: Should be 7 separate 6-8 hour subtasks
- No Details: Which scanners? What do they detect?
- No Edge Cases: Regex backtracking? False positives?
After Review: ✅ IMPROVED
Actions Taken:
Break into subtasks (7 subtasks created):
Update parent phase to be coordinator:
## Goal
Coordinate implementation of 7 HIPAA scanners
## Approach
Broken into 7 subtasks (one per scanner) for incremental implementation.
Each subtask is 4-8 hours and independently completable.
## Success Criteria
- [ ] All 7 child subtasks closed
- [ ] Integration tests pass for all 7 scanners
- [ ] Policy engine updated to use all 7 scanners"
- Update one subtask with detailed critique:
Implement VIN (Vehicle Identification Number) and license plate scanner
## Effort Estimate
6-8 hours
## Success Criteria
- [ ] VIN validation with ISO 3779 checksum algorithm
- [ ] License plate regex for US state formats
- [ ] 5+ unit tests pass (valid VIN, invalid checksum, various plate formats)
- [ ] No unwrap/expect in production code
- [ ] Clippy clean, tests pass
## Implementation Checklist
- [ ] src/scan/plugins/scanners/vehicle_identifier.rs - New module
- [ ] VIN regex pattern + ISO 3779 checksum validation function
- [ ] License plate regex (support CA, NY, TX, FL, generic formats)
- [ ] Test module with valid/invalid cases
- [ ] Register in src/scan/plugins/scanners/mod.rs
## Key Considerations - SENIOR ENGINEER REVIEW NOTES
**VIN Checksum Complexity**:
- ISO 3779 requires transliteration table (letters → numbers)
- Weighted sum algorithm with modulo 11
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
- RECOMMENDATION: Study src/scan/plugins/scanners/mbi.rs for similar checksum pattern
**False Positive Risk**:
- VIN is 17 alphanumeric chars - could match random strings
- MUST validate checksum, not just pattern
- License plates vary widely by state/country
- MUST require healthcare context via has_healthcare_context()
**Edge Cases to Handle**:
- VIN with lowercase letters (should normalize to uppercase)
- Ambiguous characters (I/1, O/0) not valid in VIN
- International plates (non-US formats)
- Partial matches at chunk boundaries (streaming concern)
## Anti-patterns
- ❌ No unwrap/expect in production code
- ❌ No regex patterns without catastrophic backtracking check (use regex101.com)
- ❌ No broad patterns without validation (checksums, context)
- ❌ No skipping context checks - ALL scanners need healthcare context"
Common Review Patterns
Pattern: Task Too Large
Symptom: Effort estimate >16 hours Fix: Break into 4-8 hour subtasks with parent-child relationships
Pattern: Vague Success Criteria
Symptom: "Works correctly", "Is implemented", "Code is clean" Fix: Replace with measurable criteria (test names, verification commands, metrics)
Pattern: Missing Edge Cases
Symptom: No mention of error handling, empty inputs, Unicode, concurrency Fix: Add "Key Considerations" section with failure modes and mitigations
Pattern: Copy-Paste Descriptions
Symptom: All subtasks have identical "Anti-patterns" and "Considerations" sections Fix: Customize each task with specific risks for that component
After Review
If Major Revisions Made:
- All issue designs updated
- Re-review with hyperpowers:sre-task-refinement
- Repeat until plan is APPROVED
If Plan is APPROVED:
Continue onto the next plan!