name	sre-task-refinement
description	Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.

Task Refinement

Overview

Refine tasks into actionable plans for a junior engineer to execute with little or no supervision. The plan should guarentee junior engineer success.

Core principle: Ask yourself, have we covered all our bases? Do we fully understand the scope? Have we provided the junior engineer enough information to succeed?

Announce at start: "I'm using the sre task refinement skill to refine the tasks."

Review bd issues for a feature plan with the critical eye of a Google Fellow SRE reviewing a junior engineer's design. Your goal is to prevent production issues, ensure implementability, and catch blind spots.

Time Expectations

Estimated review time:

Small epic (3-5 tasks): 15-20 minutes
Medium epic (6-10 tasks): 25-40 minutes
Large epic (10+ tasks): 45-60 minutes

Don't rush. Catching one critical gap before implementation saves hours of rework.

Review Philosophy

Perspective: You are a Google Fellow SRE with 20+ years of distributed systems experience.

What You're Looking For:

Can a junior engineer implement this without asking questions?
Are there hidden failure modes or edge cases?
Is the task granularity right (4-8 hour subtasks)?
Will this cause production issues?
Are there better/simpler approaches?

Review Checklist

1. Task Granularity (CRITICAL)

No task >8 hours (subtasks) or >16 hours (phases)?
Large phases broken into 4-8 hour subtasks?
Each subtask independently completable?
Each subtask has clear deliverable?

2. Implementability (Junior Engineer Test)

Can a junior engineer implement without asking questions?
Are function signatures/behaviors described, not just "implement X"?
Are test scenarios described (what they verify, not just names)?
Is "done" clearly defined with verifiable criteria?
Are all file paths specified or marked "TBD: new file"?

3. Success Criteria Quality

Each task has 3+ specific, measurable success criteria?
All criteria are testable/verifiable (not subjective)?
Includes automated verification (tests pass, clippy clean, etc.)?
No vague criteria like "works well" or "is implemented"?

4. Dependency Structure

Parent-child relationships correct (epic → phases → subtasks)?
Blocking dependencies correct (earlier work blocks later work)?
No circular dependencies?
Dependency graph makes logical sense?

5. Safety & Quality Standards

Anti-patterns include unwrap/expect prohibition?
Anti-patterns include TODO prohibition (or must have issue #)?
Anti-patterns include stub implementation prohibition?
Error handling requirements specified (use Result, avoid panic)?
Test requirements specific (test names, scenarios listed)?

6. Edge Cases & Failure Modes (Fellow SRE Perspective)

What happens with malformed input?
What happens with empty/nil/zero values?
What happens under high load/concurrency?
What happens when dependencies fail?
What happens with Unicode, special characters, large inputs?
Are these edge cases addressed in the plan?

7. Red Flags (AUTO-REJECT if found)

❌ Any task >16 hours without subtask breakdown
❌ Vague language: "implement properly", "add support", "make it work"
❌ Success criteria that can't be verified: "code is good", "works well"
❌ Missing test specifications
❌ "We'll handle this later" or "TODO" in the plan itself
❌ No anti-patterns section
❌ Implementation checklist with fewer than 3 items per task
❌ No effort estimates
❌ Missing error handling considerations
❌ CRITICAL: Placeholder text in design field - Any reference like "[detailed above]", "[as specified]", "[complete steps here]" instead of actual content

Review Process

For each phase and subtask:

A. Read the Issue

B. Apply All 7 Checklist Categories

Task Granularity
Implementability
Success Criteria Quality
Dependency Structure
Safety & Quality Standards
Edge Cases & Failure Modes
Red Flags

C. Document Findings Take notes on:

What's done well
What's missing
What's vague or ambiguous
Hidden failure modes not addressed
Better approaches or simplifications

D. Update the Issue Add to the design:

Missing edge cases in "Key Considerations" section
Missing test scenarios
Clarifications for ambiguous instructions
Warnings about potential pitfalls (e.g., "CRITICAL: regex backtracking risk")
References to similar code to study
Strengthened anti-patterns based on failure modes
Better effort estimates if original was off

E. Verify No Placeholder Text (MANDATORY) After updating, read back the issue with bd show bd-N and verify:

✅ All sections contain actual content, not meta-references
✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
✅ Implementation steps are fully written out with actual code examples
✅ Success criteria are explicit, not referencing "criteria above"
❌ If ANY placeholder text found: REJECT and rewrite with actual content

Working with bd

Reading Issues

# Show single issue with full details
bd show bd-3

# List all open issues
bd list --status open

# Show dependency tree for an epic
bd dep tree bd-1

# Find tasks ready to work on (no blocking dependencies)
bd ready

Updating Issues

When you find gaps or need to add information, update the issue's design:

# Update issue design (detailed description)
bd update bd-3 --design "$(cat <<'EOF'
## Goal
[Complete updated design with all sections]

## Effort Estimate
[Revised estimate]

## Success Criteria
- [ ] Existing criteria
- [ ] NEW: Added missing criteria

## Implementation Checklist
[Complete checklist]

## Key Considerations (ADDED BY SRE REVIEW)
**Edge Case: Empty Input**
- What happens when input is empty string?
- MUST validate input length before processing

**Edge Case: Unicode Handling**
- What if string contains RTL or surrogate pairs?
- Use proper Unicode-aware string methods

**Performance Concern: Regex Backtracking**
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
- MUST test with pathological inputs (e.g., 10000 'a's)
- Use possessive quantifiers or bounded repetition

## Anti-patterns
[Existing anti-patterns]
EOF
)"

IMPORTANT: Use --design for the full detailed description, NOT --description (which is title only).

Breaking Down Large Tasks

If a task is >16 hours, create subtasks:

# Create first subtask
bd create "Subtask 1: [Specific Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design]"
# Returns bd-10

# Create second subtask
bd create "Subtask 2: [Another Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design]"
# Returns bd-11

# Link subtasks to parent with parent-child relationship
bd dep add bd-10 bd-3 --type parent-child  # bd-10 is child of bd-3
bd dep add bd-11 bd-3 --type parent-child  # bd-11 is child of bd-3

# Add sequential dependencies if needed (LATER depends on EARLIER)
bd dep add bd-11 bd-10  # bd-11 depends on bd-10 (do bd-10 first)

# Update parent to be coordinator
bd update bd-3 --design "$(cat <<'EOF'
## Goal
Coordinate implementation of [feature]. Broken into N subtasks.

## Success Criteria
- [ ] All N child subtasks closed
- [ ] Integration tests pass
- [ ] [High-level verification criteria]
EOF
)"

Verifying Changes

After making updates:

# View updated issue
bd show bd-3

# Verify dependency structure
bd dep tree bd-1

# Check what's ready to work on
bd ready

Output Format

After reviewing all issues, provide:

## Plan Review Results

### Epic: [Name] ([epic-id])

### Overall Assessment
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]

### Dependency Structure Review
[Output of `bd dep tree [epic-id]`]

**Structure Quality**: [✅ Correct / ❌ Issues found]
- [Comments on parent-child relationships]
- [Comments on blocking dependencies]
- [Comments on granularity]

### Task-by-Task Review

#### [Task Name]
**Type**: [epic/feature/task]
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])

**Strengths**:
- [What's done well]

**Critical Issues** (must fix):
- [Blocking problems]

**Improvements Needed**:
- [What to add/clarify]

**Edge Cases Missing**:
- [Failure modes not addressed]

**Changes Made**:
- [Specific improvements added to bd issue via `bd update`]

---

[Repeat for each task/phase/subtask]

### Summary of Changes

**Issues Updated**:
- [issue-id] - [summary of improvements]
- [issue-id] - [summary of improvements]

### Critical Gaps Across Plan
1. [Pattern of missing items across multiple tasks]
2. [Systemic issues in the plan]

### Recommendations

[If APPROVE]:
✅ Plan is solid and ready for implementation.
- All tasks are junior-engineer implementable
- Dependency structure is correct
- Edge cases and failure modes addressed

[If NEEDS REVISION]:
⚠️ Plan needs improvements before implementation:
- [List major items that need addressing]
- After changes, re-run hyperpowers:sre-task-refinement

[If REJECT]:
❌ Plan has fundamental issues and needs redesign:
- [Critical problems]

Example Review

Before Review: ❌ REJECT

Issue: dlp-3
Title: Phase 2: Add HIPAA Scanners
Status: open
Priority: P1
Type: feature

Description:
Implement 7 scanners for HIPAA compliance.

Success Criteria:
- Scanners work correctly
- Tests pass

Estimated: 50-70 hours

Critical Issues:

Too Large: 50-70 hours is way beyond junior-engineer task size
Vague: "Scanners work correctly" not measurable
No Breakdown: Should be 7 separate 6-8 hour subtasks
No Details: Which scanners? What do they detect?
No Edge Cases: Regex backtracking? False positives?

After Review: ✅ IMPROVED

Actions Taken:

Break into subtasks (7 subtasks created):
Update parent phase to be coordinator:

## Goal
Coordinate implementation of 7 HIPAA scanners

## Approach
Broken into 7 subtasks (one per scanner) for incremental implementation.
Each subtask is 4-8 hours and independently completable.

## Success Criteria
- [ ] All 7 child subtasks closed
- [ ] Integration tests pass for all 7 scanners
- [ ] Policy engine updated to use all 7 scanners"

Update one subtask with detailed critique:

Implement VIN (Vehicle Identification Number) and license plate scanner

## Effort Estimate
6-8 hours

## Success Criteria
- [ ] VIN validation with ISO 3779 checksum algorithm
- [ ] License plate regex for US state formats
- [ ] 5+ unit tests pass (valid VIN, invalid checksum, various plate formats)
- [ ] No unwrap/expect in production code
- [ ] Clippy clean, tests pass

## Implementation Checklist
- [ ] src/scan/plugins/scanners/vehicle_identifier.rs - New module
- [ ] VIN regex pattern + ISO 3779 checksum validation function
- [ ] License plate regex (support CA, NY, TX, FL, generic formats)
- [ ] Test module with valid/invalid cases
- [ ] Register in src/scan/plugins/scanners/mod.rs

## Key Considerations - SENIOR ENGINEER REVIEW NOTES

**VIN Checksum Complexity**:
- ISO 3779 requires transliteration table (letters → numbers)
- Weighted sum algorithm with modulo 11
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
- RECOMMENDATION: Study src/scan/plugins/scanners/mbi.rs for similar checksum pattern

**False Positive Risk**:
- VIN is 17 alphanumeric chars - could match random strings
- MUST validate checksum, not just pattern
- License plates vary widely by state/country
- MUST require healthcare context via has_healthcare_context()

**Edge Cases to Handle**:
- VIN with lowercase letters (should normalize to uppercase)
- Ambiguous characters (I/1, O/0) not valid in VIN
- International plates (non-US formats)
- Partial matches at chunk boundaries (streaming concern)

## Anti-patterns
- ❌ No unwrap/expect in production code
- ❌ No regex patterns without catastrophic backtracking check (use regex101.com)
- ❌ No broad patterns without validation (checksums, context)
- ❌ No skipping context checks - ALL scanners need healthcare context"

Common Review Patterns

Pattern: Task Too Large

Symptom: Effort estimate >16 hours Fix: Break into 4-8 hour subtasks with parent-child relationships

Pattern: Vague Success Criteria

Symptom: "Works correctly", "Is implemented", "Code is clean" Fix: Replace with measurable criteria (test names, verification commands, metrics)

Pattern: Missing Edge Cases

Symptom: No mention of error handling, empty inputs, Unicode, concurrency Fix: Add "Key Considerations" section with failure modes and mitigations

Pattern: Copy-Paste Descriptions

Symptom: All subtasks have identical "Anti-patterns" and "Considerations" sections Fix: Customize each task with specific risks for that component

sre-task-refinement

Install Skill

SKILL.md