Claude Code Plugins

Community-maintained marketplace

Feedback

specification-phase

@marcusgoll/Spec-Flow
17
0

Provides standard operating procedures for the /specify phase including feature classification (HAS_UI, IS_IMPROVEMENT, HAS_METRICS, HAS_DEPLOYMENT_IMPACT), research depth determination, clarification strategy (max 3, informed guesses for defaults), and roadmap integration. Use when executing /specify command, classifying features, generating structured specs, or determining research depth for planning phase. (project)

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name specification-phase
description Provides standard operating procedures for the /specify phase including feature classification (HAS_UI, IS_IMPROVEMENT, HAS_METRICS, HAS_DEPLOYMENT_IMPACT), research depth determination, clarification strategy (max 3, informed guesses for defaults), and roadmap integration. Use when executing /specify command, classifying features, generating structured specs, or determining research depth for planning phase. (project)
Transform natural language feature requests into structured specifications that enable planning and implementation through systematic classification, informed guess strategy, and minimal clarification approach.

This skill orchestrates the /specify phase, producing spec.md with measurable success criteria, classification flags for downstream workflows, and maximum 3 clarifications using informed guess heuristics for technical defaults.

Core responsibilities:

  • Parse and normalize feature input to slug format (kebab-case, no filler words)
  • Check roadmap integration to reuse existing context
  • Load project documentation context (docs/project/) to prevent hallucination
  • Classify feature with 4 flags (HAS_UI, IS_IMPROVEMENT, HAS_METRICS, HAS_DEPLOYMENT_IMPACT)
  • Determine research depth based on complexity (FLAG_COUNT: 0→minimal, 1→standard, ≥2→full)
  • Apply informed guess strategy for non-critical technical decisions
  • Generate ≤3 clarifications (only scope/security critical)
  • Write measurable success criteria (quantifiable, user-facing)

Inputs: User's feature description, existing roadmap entries, project docs (if exists), codebase context Outputs: specs/NNN-slug/spec.md, NOTES.md, state.yaml, updated roadmap (if FROM_ROADMAP=true) Expected duration: 10-15 minutes

Execute /specify workflow in 9 steps:
  1. Parse and normalize input - Generate slug (remove filler words: "add", "create", "we want to"), assign feature number (NNN)
  2. Check roadmap integration - Search roadmap for slug match, reuse context if found (saves ~10 minutes)
  3. Load project documentation - Extract tech stack, API patterns, existing entities from docs/project/ (prevents hallucination)
  4. Classify feature - Set HAS_UI, IS_IMPROVEMENT, HAS_METRICS, HAS_DEPLOYMENT_IMPACT flags using decision tree
  5. Determine research depth - FLAG_COUNT → minimal (0), standard (1), full (≥2)
  6. Apply informed guess strategy - Use defaults for performance (<500ms), rate limits (100/min), auth (OAuth2), retention (90d logs)
  7. Generate clarifications - Max 3 critical clarifications (scope > security > UX > technical priority matrix)
  8. Write success criteria - Measurable, quantifiable, user-facing outcomes (not tech-specific)
  9. Generate artifacts and commit - Create spec.md, NOTES.md, state.yaml, update roadmap

Key principle: Use informed guesses for non-critical decisions, document assumptions, limit clarifications to ≤3.

Before executing /specify phase:

  • Git working tree clean (no uncommitted changes)
  • Not on main branch (create feature branch: feature/NNN-slug)
  • Required templates exist in .spec-flow/templates/
  • If project docs exist, docs/project/ directory accessible
Required understanding before execution:
  • Classification decision tree: UI keywords vs backend-only exclusions, improvement = baseline + target, metrics = user tracking, deployment = env vars/migrations/breaking changes
  • Informed guess heuristics: Performance (<500ms), retention (90d logs, 365d analytics), rate limits (100/min), auth (OAuth2), error handling (user-friendly + error ID)
  • Clarification prioritization matrix: Critical (scope, security, breaking changes) > High (UX) > Medium (performance SLA, tech stack) > Low (error messages, rate limits)
  • Slug generation rules: Remove filler words, lowercase kebab-case, 50 char limit, fuzzy match roadmap

See reference.md for complete decision trees and heuristics.

- **Over-clarification**: >3 clarifications indicates missing informed guesses (check reference.md for defaults) - **Wrong classification**: Backend workers/APIs with HAS_UI=true wastes time generating UI artifacts - **Roadmap mismatch**: Filler words in slug prevent roadmap match ("add-student-dashboard" vs "student-dashboard") - **Vague success criteria**: "Works correctly", "is fast", "looks good" are not measurable - use quantifiable metrics **Parse and Normalize Input**

Extract feature description and generate clean slug.

Slug Generation Rules:

# Remove filler words and normalize
echo "$ARGUMENTS" |
  sed 's/\bwe want to\b//gi; s/\bI want to\b//gi' |
  sed 's/\badd\b//gi; s/\bcreate\b//gi; s/\bimplement\b//gi' |
  sed 's/\bget our\b//gi; s/\bto a\b//gi; s/\bwith\b//gi' |
  tr '[:upper:]' '[:lower:]' |
  sed 's/[^a-z0-9-]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' |
  cut -c1-50

Example:

  • Input: "We want to add student progress dashboard"
  • Slug: student-progress-dashboard
  • Number: 042
  • Directory: specs/042-student-progress-dashboard/

Quality Check: Slug is concise, descriptive, no filler words, matches roadmap format.

See reference.md § Slug Generation Rules for detailed normalization logic.

**Check Roadmap Integration**

Search roadmap for existing entry to reuse context.

Detection Logic:

SLUG="student-progress-dashboard"
if grep -qi "^### ${SLUG}" .spec-flow/memory/roadmap.md; then
  FROM_ROADMAP=true
  # Extract requirements, area, role, impact/effort scores
  # Move entry from "Backlog"/"Next" to "In Progress"
  # Add branch and spec links
fi

Benefits:

  • Saves ~10 minutes (requirements already documented)
  • Automatic status tracking (roadmap updates)
  • Context reuse (user already vetted scope)

Quality Check: If roadmap match found, extracted requirements appear in spec.md.

See reference.md § Roadmap Integration for fuzzy matching logic.

**Load Project Documentation Context**

Extract architecture constraints from docs/project/ to prevent hallucination.

Detection:

if [ -d "docs/project" ]; then
  HAS_PROJECT_DOCS=true
  # Extract tech stack, API patterns, entities
else
  echo "ℹ️  No project documentation (run /init-project recommended)"
  HAS_PROJECT_DOCS=false
fi

Extraction:

# Extract constraints from project docs
FRONTEND=$(grep -A 1 "| Frontend" docs/project/tech-stack.md | tail -1)
DATABASE=$(grep -A 1 "| Database" docs/project/tech-stack.md | tail -1)
API_STYLE=$(grep -A 1 "## API Style" docs/project/api-strategy.md | tail -1)
ENTITIES=$(grep -oP '[A-Z_]+(?= \{)' docs/project/data-architecture.md)

Document in NOTES.md:

## Project Documentation Context

**Tech Stack Constraints** (from tech-stack.md):

- Frontend: Next.js 14
- Database: PostgreSQL

**Spec Requirements**:

- ✅ MUST use documented tech stack
- ✅ MUST follow API patterns from api-strategy.md
- ✅ MUST check for duplicate entities before proposing new ones
- ❌ MUST NOT hallucinate MongoDB if PostgreSQL is documented

Quality Check: NOTES.md includes project context section with constraints.

Skip if: No docs/project/ directory exists (warn user but don't block).

**Classify Feature**

Set classification flags using decision tree.

Classification Logic:

HAS_UI (user-facing screens/components):

UI keywords: screen, page, dashboard, form, modal, component, frontend, interface
  → BUT NOT backend-only: API, endpoint, service, worker, cron, job, migration, health check
  → HAS_UI = true

Example: "Add student dashboard" → HAS_UI = true ✅
Example: "Add background worker" → HAS_UI = false ✅ (backend-only)

IS_IMPROVEMENT (optimization with measurable baseline):

Improvement keywords: improve, optimize, enhance, speed up, reduce time
  AND baseline: existing, current, slow, faster, better
  → IS_IMPROVEMENT = true

Example: "Improve search performance from 3.2s to <500ms" → IS_IMPROVEMENT = true ✅
Example: "Make search faster" → IS_IMPROVEMENT = false ❌ (no baseline)

HAS_METRICS (user behavior tracking):

Metrics keywords: track user, measure user, engagement, retention, conversion, analytics, A/B test
  → HAS_METRICS = true

Example: "Track dashboard engagement" → HAS_METRICS = true ✅

HAS_DEPLOYMENT_IMPACT (env vars, migrations, breaking changes):

Deployment keywords: migration, schema change, env variable, breaking change, docker, platform change
  → HAS_DEPLOYMENT_IMPACT = true

Example: "OAuth authentication (needs env vars)" → HAS_DEPLOYMENT_IMPACT = true ✅

Quality Check: Classification matches feature intent. No UI artifacts for backend workers.

See reference.md § Classification Decision Tree for complete logic.

**Determine Research Depth**

Select research depth based on complexity (FLAG_COUNT).

Research Depth Guidelines:

FLAG_COUNT Depth Tools Use Cases
0 Minimal 1-2 Simple backend endpoint, config change
1 Standard 3-5 Single-aspect feature (UI-only or backend-only)
≥2 Full 5-8 Complex multi-aspect feature

Minimal Research (FLAG_COUNT = 0):

  1. Constitution check (alignment with mission/values)
  2. Grep for similar patterns

Standard Research (FLAG_COUNT = 1): 1-2. Minimal research 3. UI inventory scan (if HAS_UI=true) 4. Performance budgets check 5. Similar spec search

Full Research (FLAG_COUNT ≥ 2): 1-5. Standard research 6. Design inspirations (if HAS_UI=true) 7. Web search for novel patterns 8. Integration points analysis

Quality Check: Research depth matches complexity. Don't over-research simple features.

See reference.md § Research Depth Guidelines for tool selection logic.

**Apply Informed Guess Strategy**

Use defaults for non-critical technical decisions.

Use Informed Guesses For (do NOT clarify):

  • Performance targets: API <500ms (95th percentile), Frontend FCP <1.5s, TTI <3.0s
  • Data retention: User data indefinite (with deletion option), Logs 90 days, Analytics 365 days
  • Error handling: User-friendly messages + error ID (ERR-XXXX), logging to error-log.md
  • Rate limiting: 100 requests/minute per user, 1000/minute per IP
  • Authentication: OAuth2 for users, JWT for APIs, MFA optional (users), required (admins)
  • Integration patterns: RESTful API (unless GraphQL needed), JSON format, offset/limit pagination

Document Assumptions in spec.md:

## Performance Targets (Assumed)

- API endpoints: <500ms (95th percentile)
- Frontend FCP: <1.5s, TTI: <3.0s
- Lighthouse: Performance ≥85, Accessibility ≥95

_Assumption: Standard web application performance expectations applied._

Do NOT Use Informed Guesses For:

  • Scope-defining decisions (feature boundary, user roles)
  • Security/privacy critical choices (encryption, PII handling)
  • Breaking changes (API response format changes)
  • No reasonable default exists (business logic decisions)

Quality Check: No clarifications for decisions with industry-standard defaults.

See reference.md § Informed Guess Heuristics for complete defaults catalog.

**Generate Clarifications (Max 3)**

Identify ambiguities that cannot be reasonably assumed.

Clarification Prioritization Matrix:

Category Priority Ask? Example
Scope boundary Critical ✅ Always "Does this include admin features or only user features?"
Security/Privacy Critical ✅ Always "Should PII be encrypted at rest?"
Breaking changes Critical ✅ Always "Is it okay to change the API response format?"
User experience High ✅ If ambiguous "Should this be a modal or new page?"
Performance SLA Medium ❌ Use defaults "What's the target response time?" → Assume <500ms
Technical stack Medium ❌ Defer to plan "Which database?" → Planning-phase decision
Error messages Low ❌ Use standard "What error message?" → Standard pattern
Rate limits Low ❌ Use defaults "How many requests?" → 100/min default

Limit: Maximum 3 clarifications total

Process:

  1. Identify all ambiguities
  2. Rank by category priority
  3. Keep top 3 most critical
  4. Make informed guesses for remaining
  5. Document assumptions clearly

Good Clarifications:

[NEEDS CLARIFICATION: Should dashboard show all students or only assigned classes?]
[NEEDS CLARIFICATION: Should parents have access to this dashboard?]

Bad Clarifications (have defaults):

❌ [NEEDS CLARIFICATION: What's the target response time?] → Use 500ms default
❌ [NEEDS CLARIFICATION: Which database to use?] → Planning-phase decision
❌ [NEEDS CLARIFICATION: What error message format?] → Use standard pattern

Quality Check: ≤3 clarifications total, all are scope/security/UX critical.

See reference.md § Clarification Prioritization Matrix for complete ranking system.

**Write Success Criteria**

Define measurable, quantifiable, user-facing outcomes.

Good Criteria Format:

## Success Criteria

- User can complete registration in <3 minutes (measured via PostHog funnel)
- API response time <500ms for 95th percentile (measured via Datadog APM)
- Lighthouse accessibility score ≥95 (measured via CI Lighthouse check)
- 95% of user searches return results in <1 second

Criteria Requirements:

  • Measurable: Can be validated objectively
  • Quantifiable: Has specific numeric target
  • User-facing: Focuses on outcomes, not implementation
  • Measurement method: Specifies how to validate (tool, metric)

Bad Criteria to Avoid:

❌ System works correctly (not measurable)
❌ API is fast (not quantifiable)
❌ UI looks good (subjective)
❌ React components render efficiently (technology-specific, not outcome-focused)

Quality Check: Every criterion is measurable, quantifiable, and outcome-focused.

**Generate Artifacts and Commit**

Create specification artifacts and commit to git.

Artifacts:

  1. specs/NNN-slug/spec.md - Structured specification with classification flags, requirements, assumptions, clarifications (≤3), success criteria
  2. specs/NNN-slug/NOTES.md - Implementation decisions and project context
  3. specs/NNN-slug/visuals/README.md - Visual artifacts directory (if HAS_UI=true)
  4. specs/NNN-slug/state.yaml - Phase state tracking (currentPhase: specification, status: completed)
  5. Updated roadmap (if FROM_ROADMAP=true) - Move entry to "In Progress", add branch/spec links

Validation Checks:

  • Requirements checklist complete (if exists)
  • Clarifications ≤3
  • Classification flags match feature description
  • Success criteria are measurable
  • Roadmap updated (if FROM_ROADMAP=true)

Commit Message:

git add specs/NNN-slug/
git commit -m "feat: add spec for <feature-name>

Generated specification with classification:
- HAS_UI: <true/false>
- IS_IMPROVEMENT: <true/false>
- HAS_METRICS: <true/false>
- HAS_DEPLOYMENT_IMPACT: <true/false>

Clarifications: N
Research depth: <minimal/standard/full>"

Quality Check: All artifacts created, spec committed, roadmap updated, state.yaml initialized.

**Pre-phase checks**: - [ ] Git working tree clean - [ ] Not on main branch - [ ] Required templates exist

During phase:

  • Slug normalized (no filler words, lowercase kebab-case)
  • Roadmap checked for existing entry
  • Project docs loaded (if exist)
  • Classification flags validated (no conflicting keywords)
  • Research depth appropriate for complexity (FLAG_COUNT-based)
  • Clarifications ≤3 (use informed guesses for rest)
  • Success criteria measurable and quantifiable
  • Assumptions documented clearly

Post-phase validation:

  • spec.md committed with proper message
  • NOTES.md initialized
  • Roadmap updated (if FROM_ROADMAP=true)
  • visuals/ created (if HAS_UI=true)
  • state.yaml initialized
**Specification quality targets**:
  • Clarifications: ≤3 per spec
  • Classification accuracy: ≥90%
  • Success criteria: 100% measurable
  • Time to spec: ≤15 minutes
  • Rework rate: <5%

What makes a good spec:

  • Clear scope boundaries (what's in, what's out)
  • Measurable success criteria (with measurement methods)
  • Reasonable assumptions documented
  • Minimal clarifications (only critical scope/security)
  • Correct classification (matches feature intent)
  • Context reuse (from roadmap when available)

What makes a bad spec:

  • 5 clarifications (over-clarifying technical defaults)

  • Vague success criteria ("works correctly", "is fast")
  • Technology-specific criteria (couples to implementation)
  • Wrong classification (UI flag for backend-only)
  • Missing roadmap integration (duplicates planning work)
**Impact**: Delays planning phase, frustrates users

Scenario:

Spec with 7 clarifications (limit: 3):
- [NEEDS CLARIFICATION: What format? CSV or JSON?] → Use JSON default
- [NEEDS CLARIFICATION: Rate limiting strategy?] → Use 100/min default
- [NEEDS CLARIFICATION: Maximum file size?] → Use 50MB default

Prevention:

  • Use industry-standard defaults (see reference.md § Informed Guess Heuristics)
  • Only mark critical scope/security decisions as NEEDS CLARIFICATION
  • Document assumptions in spec.md "Assumptions" section
  • Prioritize by impact: scope > security > UX > technical

If encountered: Reduce to 3 most critical, convert others to documented assumptions.

**Impact**: Creates unnecessary UI artifacts, wastes time

Scenario:

Feature: "Add background worker to process uploads"
Classified: HAS_UI=true (WRONG - no user-facing UI)
Result: Generated screens.yaml for backend-only feature

Root cause: Keyword "process" triggered false positive without checking for backend-only keywords

Prevention:

  1. Check for backend-only keywords: worker, cron, job, migration, health check, API, endpoint, service
  2. Require BOTH UI keyword AND absence of backend-only keywords
  3. Use classification decision tree (see reference.md § Classification Decision Tree)
**Impact**: Misses context reuse opportunity, duplicates planning work

Scenario:

User input: "We want to add Student Progress Dashboard"
Generated slug: "add-student-progress-dashboard" (BAD - includes "add")
Roadmap entry: "### student-progress-dashboard" (slug without "add")
Result: No match found, created fresh spec instead of reusing roadmap context

Prevention:

  1. Remove filler words during slug generation: "add", "create", "we want to"
  2. Normalize to lowercase kebab-case consistently
  3. Offer fuzzy matches if exact match not found
**Impact**: Wastes time, exceeds token budget

Scenario:

# 15 research tools for simple backend endpoint (WRONG)
Glob *.py, Glob *.ts, Glob *.tsx, Grep "database", Grep "model"...

Prevention:

  • Use research depth guidelines: FLAG_COUNT = 0 → 1-2 tools only
  • For simple features: Constitution check + pattern search is sufficient
  • Reserve full research (5-8 tools) for complex multi-aspect features

Correct approach:

# 2 research tools for simple backend endpoint (CORRECT)
grep "similar endpoint" specs/*/spec.md
grep "BaseModel" api/app/models/*.py
**Impact**: Cannot validate implementation, leads to scope creep

Bad examples:

❌ System works correctly (not measurable)
❌ API is fast (not quantifiable)
❌ UI looks good (subjective)

Good examples:

✅ User can complete registration in <3 minutes (measured via PostHog funnel)
✅ API response time <500ms for 95th percentile (measured via Datadog APM)
✅ Lighthouse accessibility score ≥95 (measured via CI Lighthouse check)

Prevention: Every criterion must answer: "How do we measure this objectively?"

**Impact**: Couples spec to implementation details, reduces flexibility

Bad examples:

❌ React components render efficiently
❌ Redis cache hit rate >80%
❌ PostgreSQL queries use proper indexes

Good examples (outcome-focused):

✅ Page load time <1.5s (First Contentful Paint)
✅ 95% of user searches return results in <1 second
✅ Database queries complete in <100ms average

Prevention: Focus on user-facing outcomes, not implementation mechanisms.

**When to use**:
  • Non-critical technical decisions
  • Industry standards exist
  • Default won't impact scope or security

When NOT to use:

  • Scope-defining decisions
  • Security/privacy critical choices
  • No reasonable default exists

Approach:

  1. Check if decision has reasonable default (see reference.md § Informed Guess Heuristics)
  2. Document assumption clearly in spec
  3. Note that user can override if needed

Example:

## Performance Targets (Assumed)

- API response time: <500ms (95th percentile)
- Frontend FCP: <1.5s
- Database queries: <100ms

**Assumptions**:

- Standard web app performance expectations applied
- If requirements differ, specify in "Performance Requirements" section

Result: Reduces clarifications from 5-7 to 0-2, saves ~10 minutes

**When to use**:
  • Feature name matches roadmap entry
  • Roadmap uses consistent slug format

Approach:

  1. Normalize user input to slug format
  2. Search roadmap for exact match
  3. If found: Extract requirements, area, role, impact/effort
  4. Move to "In Progress" section automatically
  5. Add branch and spec links

Result: Saves ~10 minutes, requirements already vetted, automatic status tracking

**Best practice**: Follow decision tree systematically (see reference.md § Classification Decision Tree)

Process:

  1. Check UI keywords → Set HAS_UI (but verify no backend-only exclusions)
  2. Check improvement keywords + baseline → Set IS_IMPROVEMENT
  3. Check metrics keywords → Set HAS_METRICS
  4. Check deployment keywords → Set HAS_DEPLOYMENT_IMPACT

Result: 90%+ classification accuracy, correct artifacts generated

Specification phase complete when:
  • All pre-phase checks passed
  • All 9 execution steps completed
  • All post-phase validations passed
  • Spec committed to git
  • state.yaml shows currentPhase: specification and status: completed

Ready to proceed when:

  • If clarifications >0 → Run /clarify
  • If clarifications = 0 → Run /plan
**Issue**: Too many clarifications (>3) **Solution**: Review reference.md § Informed Guess Heuristics for defaults, convert non-critical questions to assumptions

Issue: Classification seems wrong Solution: Re-check decision tree, verify no backend-only keywords for HAS_UI=true

Issue: Roadmap entry exists but wasn't matched Solution: Check slug normalization, offer fuzzy matches, update roadmap slug format

Issue: Success criteria are vague Solution: Add quantifiable metrics and measurement methods, focus on user-facing outcomes

Issue: Research depth excessive for simple feature Solution: Follow FLAG_COUNT guidelines (0→minimal, 1→standard, ≥2→full)

Issue: Project context missing/incomplete Solution: Run /init-project to generate docs/project/, or manually create tech-stack.md and api-strategy.md

For detailed documentation:

Classification & Defaults: reference.md

  • Classification decision tree (HAS_UI, IS_IMPROVEMENT, HAS_METRICS, HAS_DEPLOYMENT_IMPACT)
  • Informed guess heuristics (performance, retention, auth, rate limits)
  • Clarification prioritization matrix (scope > security > UX > technical)
  • Slug generation rules (normalization, filler word removal)

Real-World Examples: examples.md

  • Good specs (0-2 clarifications) vs bad specs (>5 clarifications)
  • Side-by-side comparisons with analysis
  • Progressive learning patterns (features 1-5, 6-15, 16+)
  • Classification accuracy examples

Execution Details: reference.md § Research Depth Guidelines, Common Mistakes to Avoid

Next phase after /specify:

  • If clarifications >0 → /clarify (reduces ambiguity via targeted questions)
  • If clarifications = 0 → /plan (generates design artifacts from spec)