Claude Code Plugins

Community-maintained marketplace

Feedback

sre-task-refinement

@withzombies/hyperpowers
6
0

Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name sre-task-refinement
description Use when you have to refine subtasks into actionable plans ensuring that all corner cases are handled and we understand all the requirements.

Task Refinement

Overview

Refine tasks into actionable plans for a junior engineer to execute with little or no supervision. The plan should guarentee junior engineer success.

Core principle: Ask yourself, have we covered all our bases? Do we fully understand the scope? Have we provided the junior engineer enough information to succeed?

Announce at start: "I'm using the sre task refinement skill to refine the tasks."

Review bd issues for a feature plan with the critical eye of a Google Fellow SRE reviewing a junior engineer's design. Your goal is to prevent production issues, ensure implementability, and catch blind spots.

Time Expectations

Estimated review time:

  • Small epic (3-5 tasks): 15-20 minutes
  • Medium epic (6-10 tasks): 25-40 minutes
  • Large epic (10+ tasks): 45-60 minutes

Don't rush. Catching one critical gap before implementation saves hours of rework.

Review Philosophy

Perspective: You are a Google Fellow SRE with 20+ years of distributed systems experience.

What You're Looking For:

  • Can a junior engineer implement this without asking questions?
  • Are there hidden failure modes or edge cases?
  • Is the task granularity right (4-8 hour subtasks)?
  • Will this cause production issues?
  • Are there better/simpler approaches?

Review Checklist

1. Task Granularity (CRITICAL)

  • No task >8 hours (subtasks) or >16 hours (phases)?
  • Large phases broken into 4-8 hour subtasks?
  • Each subtask independently completable?
  • Each subtask has clear deliverable?

2. Implementability (Junior Engineer Test)

  • Can a junior engineer implement without asking questions?
  • Are function signatures/behaviors described, not just "implement X"?
  • Are test scenarios described (what they verify, not just names)?
  • Is "done" clearly defined with verifiable criteria?
  • Are all file paths specified or marked "TBD: new file"?

3. Success Criteria Quality

  • Each task has 3+ specific, measurable success criteria?
  • All criteria are testable/verifiable (not subjective)?
  • Includes automated verification (tests pass, clippy clean, etc.)?
  • No vague criteria like "works well" or "is implemented"?

4. Dependency Structure

  • Parent-child relationships correct (epic → phases → subtasks)?
  • Blocking dependencies correct (earlier work blocks later work)?
  • No circular dependencies?
  • Dependency graph makes logical sense?

5. Safety & Quality Standards

  • Anti-patterns include unwrap/expect prohibition?
  • Anti-patterns include TODO prohibition (or must have issue #)?
  • Anti-patterns include stub implementation prohibition?
  • Error handling requirements specified (use Result, avoid panic)?
  • Test requirements specific (test names, scenarios listed)?

6. Edge Cases & Failure Modes (Fellow SRE Perspective)

  • What happens with malformed input?
  • What happens with empty/nil/zero values?
  • What happens under high load/concurrency?
  • What happens when dependencies fail?
  • What happens with Unicode, special characters, large inputs?
  • Are these edge cases addressed in the plan?

7. Red Flags (AUTO-REJECT if found)

  • ❌ Any task >16 hours without subtask breakdown
  • ❌ Vague language: "implement properly", "add support", "make it work"
  • ❌ Success criteria that can't be verified: "code is good", "works well"
  • ❌ Missing test specifications
  • ❌ "We'll handle this later" or "TODO" in the plan itself
  • ❌ No anti-patterns section
  • ❌ Implementation checklist with fewer than 3 items per task
  • ❌ No effort estimates
  • ❌ Missing error handling considerations
  • CRITICAL: Placeholder text in design field - Any reference like "[detailed above]", "[as specified]", "[complete steps here]" instead of actual content

Review Process

For each phase and subtask:

A. Read the Issue

B. Apply All 7 Checklist Categories

  • Task Granularity
  • Implementability
  • Success Criteria Quality
  • Dependency Structure
  • Safety & Quality Standards
  • Edge Cases & Failure Modes
  • Red Flags

C. Document Findings Take notes on:

  • What's done well
  • What's missing
  • What's vague or ambiguous
  • Hidden failure modes not addressed
  • Better approaches or simplifications

D. Update the Issue Add to the design:

  • Missing edge cases in "Key Considerations" section
  • Missing test scenarios
  • Clarifications for ambiguous instructions
  • Warnings about potential pitfalls (e.g., "CRITICAL: regex backtracking risk")
  • References to similar code to study
  • Strengthened anti-patterns based on failure modes
  • Better effort estimates if original was off

E. Verify No Placeholder Text (MANDATORY) After updating, read back the issue with bd show bd-N and verify:

  • ✅ All sections contain actual content, not meta-references
  • ✅ No placeholder text like "[detailed above]", "[as specified]", "[will be added]"
  • ✅ Implementation steps are fully written out with actual code examples
  • ✅ Success criteria are explicit, not referencing "criteria above"
  • ❌ If ANY placeholder text found: REJECT and rewrite with actual content

Working with bd

Reading Issues

# Show single issue with full details
bd show bd-3

# List all open issues
bd list --status open

# Show dependency tree for an epic
bd dep tree bd-1

# Find tasks ready to work on (no blocking dependencies)
bd ready

Updating Issues

When you find gaps or need to add information, update the issue's design:

# Update issue design (detailed description)
bd update bd-3 --design "$(cat <<'EOF'
## Goal
[Complete updated design with all sections]

## Effort Estimate
[Revised estimate]

## Success Criteria
- [ ] Existing criteria
- [ ] NEW: Added missing criteria

## Implementation Checklist
[Complete checklist]

## Key Considerations (ADDED BY SRE REVIEW)
**Edge Case: Empty Input**
- What happens when input is empty string?
- MUST validate input length before processing

**Edge Case: Unicode Handling**
- What if string contains RTL or surrogate pairs?
- Use proper Unicode-aware string methods

**Performance Concern: Regex Backtracking**
- Pattern `.*[a-z]+.*` has catastrophic backtracking risk
- MUST test with pathological inputs (e.g., 10000 'a's)
- Use possessive quantifiers or bounded repetition

## Anti-patterns
[Existing anti-patterns]
EOF
)"

IMPORTANT: Use --design for the full detailed description, NOT --description (which is title only).

Breaking Down Large Tasks

If a task is >16 hours, create subtasks:

# Create first subtask
bd create "Subtask 1: [Specific Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design]"
# Returns bd-10

# Create second subtask
bd create "Subtask 2: [Another Component]" \
  --type task \
  --priority 1 \
  --design "[Complete subtask design]"
# Returns bd-11

# Link subtasks to parent with parent-child relationship
bd dep add bd-10 bd-3 --type parent-child  # bd-10 is child of bd-3
bd dep add bd-11 bd-3 --type parent-child  # bd-11 is child of bd-3

# Add sequential dependencies if needed (LATER depends on EARLIER)
bd dep add bd-11 bd-10  # bd-11 depends on bd-10 (do bd-10 first)

# Update parent to be coordinator
bd update bd-3 --design "$(cat <<'EOF'
## Goal
Coordinate implementation of [feature]. Broken into N subtasks.

## Success Criteria
- [ ] All N child subtasks closed
- [ ] Integration tests pass
- [ ] [High-level verification criteria]
EOF
)"

Verifying Changes

After making updates:

# View updated issue
bd show bd-3

# Verify dependency structure
bd dep tree bd-1

# Check what's ready to work on
bd ready

Output Format

After reviewing all issues, provide:

## Plan Review Results

### Epic: [Name] ([epic-id])

### Overall Assessment
[APPROVE ✅ / NEEDS REVISION ⚠️ / REJECT ❌]

### Dependency Structure Review
[Output of `bd dep tree [epic-id]`]

**Structure Quality**: [✅ Correct / ❌ Issues found]
- [Comments on parent-child relationships]
- [Comments on blocking dependencies]
- [Comments on granularity]

### Task-by-Task Review

#### [Task Name]
**Type**: [epic/feature/task]
**Status**: [✅ Ready / ⚠️ Needs Minor Improvements / ❌ Needs Major Revision]
**Estimated Effort**: [X hours] ([✅ Good / ❌ Too large - needs breakdown])

**Strengths**:
- [What's done well]

**Critical Issues** (must fix):
- [Blocking problems]

**Improvements Needed**:
- [What to add/clarify]

**Edge Cases Missing**:
- [Failure modes not addressed]

**Changes Made**:
- [Specific improvements added to bd issue via `bd update`]

---

[Repeat for each task/phase/subtask]

### Summary of Changes

**Issues Updated**:
- [issue-id] - [summary of improvements]
- [issue-id] - [summary of improvements]

### Critical Gaps Across Plan
1. [Pattern of missing items across multiple tasks]
2. [Systemic issues in the plan]

### Recommendations

[If APPROVE]:
✅ Plan is solid and ready for implementation.
- All tasks are junior-engineer implementable
- Dependency structure is correct
- Edge cases and failure modes addressed

[If NEEDS REVISION]:
⚠️ Plan needs improvements before implementation:
- [List major items that need addressing]
- After changes, re-run hyperpowers:sre-task-refinement

[If REJECT]:
❌ Plan has fundamental issues and needs redesign:
- [Critical problems]

Example Review

Before Review: ❌ REJECT

Issue: dlp-3
Title: Phase 2: Add HIPAA Scanners
Status: open
Priority: P1
Type: feature

Description:
Implement 7 scanners for HIPAA compliance.

Success Criteria:
- Scanners work correctly
- Tests pass

Estimated: 50-70 hours

Critical Issues:

  1. Too Large: 50-70 hours is way beyond junior-engineer task size
  2. Vague: "Scanners work correctly" not measurable
  3. No Breakdown: Should be 7 separate 6-8 hour subtasks
  4. No Details: Which scanners? What do they detect?
  5. No Edge Cases: Regex backtracking? False positives?

After Review: ✅ IMPROVED

Actions Taken:

  1. Break into subtasks (7 subtasks created):

  2. Update parent phase to be coordinator:

## Goal
Coordinate implementation of 7 HIPAA scanners

## Approach
Broken into 7 subtasks (one per scanner) for incremental implementation.
Each subtask is 4-8 hours and independently completable.

## Success Criteria
- [ ] All 7 child subtasks closed
- [ ] Integration tests pass for all 7 scanners
- [ ] Policy engine updated to use all 7 scanners"
  1. Update one subtask with detailed critique:
Implement VIN (Vehicle Identification Number) and license plate scanner

## Effort Estimate
6-8 hours

## Success Criteria
- [ ] VIN validation with ISO 3779 checksum algorithm
- [ ] License plate regex for US state formats
- [ ] 5+ unit tests pass (valid VIN, invalid checksum, various plate formats)
- [ ] No unwrap/expect in production code
- [ ] Clippy clean, tests pass

## Implementation Checklist
- [ ] src/scan/plugins/scanners/vehicle_identifier.rs - New module
- [ ] VIN regex pattern + ISO 3779 checksum validation function
- [ ] License plate regex (support CA, NY, TX, FL, generic formats)
- [ ] Test module with valid/invalid cases
- [ ] Register in src/scan/plugins/scanners/mod.rs

## Key Considerations - SENIOR ENGINEER REVIEW NOTES

**VIN Checksum Complexity**:
- ISO 3779 requires transliteration table (letters → numbers)
- Weighted sum algorithm with modulo 11
- Reference: https://en.wikipedia.org/wiki/Vehicle_identification_number#Check_digit
- RECOMMENDATION: Study src/scan/plugins/scanners/mbi.rs for similar checksum pattern

**False Positive Risk**:
- VIN is 17 alphanumeric chars - could match random strings
- MUST validate checksum, not just pattern
- License plates vary widely by state/country
- MUST require healthcare context via has_healthcare_context()

**Edge Cases to Handle**:
- VIN with lowercase letters (should normalize to uppercase)
- Ambiguous characters (I/1, O/0) not valid in VIN
- International plates (non-US formats)
- Partial matches at chunk boundaries (streaming concern)

## Anti-patterns
- ❌ No unwrap/expect in production code
- ❌ No regex patterns without catastrophic backtracking check (use regex101.com)
- ❌ No broad patterns without validation (checksums, context)
- ❌ No skipping context checks - ALL scanners need healthcare context"

Common Review Patterns

Pattern: Task Too Large

Symptom: Effort estimate >16 hours Fix: Break into 4-8 hour subtasks with parent-child relationships

Pattern: Vague Success Criteria

Symptom: "Works correctly", "Is implemented", "Code is clean" Fix: Replace with measurable criteria (test names, verification commands, metrics)

Pattern: Missing Edge Cases

Symptom: No mention of error handling, empty inputs, Unicode, concurrency Fix: Add "Key Considerations" section with failure modes and mitigations

Pattern: Copy-Paste Descriptions

Symptom: All subtasks have identical "Anti-patterns" and "Considerations" sections Fix: Customize each task with specific risks for that component

After Review

If Major Revisions Made:

  1. All issue designs updated
  2. Re-review with hyperpowers:sre-task-refinement
  3. Repeat until plan is APPROVED

If Plan is APPROVED:

Continue onto the next plan!