Claude Code Plugins

Community-maintained marketplace

Feedback

quality-validation

@VAMFI/claude-user-memory
77
0

Systematic validation methodology for ResearchPacks and Implementation Plans. Provides scoring rubrics and quality gates to ensure outputs meet standards before proceeding to next phase. Prevents garbage-in-garbage-out scenarios.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name quality-validation
description Systematic validation methodology for ResearchPacks and Implementation Plans. Provides scoring rubrics and quality gates to ensure outputs meet standards before proceeding to next phase. Prevents garbage-in-garbage-out scenarios.
auto_invoke true
tags validation, quality, verification, gates

Quality Validation Skill

This skill provides systematic validation methodology to ensure ResearchPacks and Implementation Plans meet quality standards before proceeding to implementation.

When Claude Should Use This Skill

Claude will automatically invoke this skill when:

  • ResearchPack completed and needs validation before planning
  • Implementation Plan completed and needs validation before coding
  • User explicitly requests quality check ("validate this", "is this complete?")
  • About to proceed to next workflow phase (quality gate trigger)

Core Principles (BRAHMA Constitution)

  1. Verification over speculation - Validate with objective criteria
  2. Quality gates - Don't proceed with bad inputs
  3. Reproducibility - Same input quality = same score
  4. Explicit defects - List specific problems, not vague "could be better"

Validation Targets

Research Type Detection

Before scoring, detect research type to apply appropriate rubric:

Type 1: API/Library Research

Indicators:

  • Contains API endpoints, function signatures, method calls
  • Code examples with specific library imports
  • Configuration/setup steps for external dependencies
  • Version numbers for libraries/frameworks

Scoring: Use API Research Rubric (80+ pass threshold)

Type 2: Philosophy Research

Indicators:

  • Contains themes, principles, patterns, methodologies
  • Thematic organization (Theme 1, Theme 2, etc.)
  • Cross-source synthesis
  • Engineering philosophy or best practices analysis
  • Pattern extraction from multiple sources

Scoring: Use Philosophy Research Rubric (70+ pass threshold)

Examples: Engineering philosophy, architectural patterns, best practices, methodology research

Type 3: Pattern Research

Indicators:

  • Contains code patterns, design patterns, anti-patterns
  • Architectural decisions and tradeoffs
  • Implementation strategies
  • Performance optimization patterns

Scoring: Use Pattern Research Rubric (70+ pass threshold)

Why Different Thresholds?

  • API research is more objective (APIs exist or don't, versions are correct or wrong)
  • Philosophy research is more subjective (thematic organization, synthesis quality)
  • Philosophy research provides strategic value even if not as "complete" as API docs

1. ResearchPack Validation - API/Library Type

Purpose: Ensure research is complete, accurate, and actionable before planning

Validation Rubric for API/Library Research (100 points total, 80+ pass threshold):

Completeness (40 points)

  • ✓ Library/API identified with version (10 pts)
  • ✓ At least 3 key APIs documented (10 pts)
  • ✓ Setup/configuration steps provided (10 pts)
  • ✓ At least 1 complete code example (10 pts)

Accuracy (30 points)

  • ✓ All API signatures match official docs exactly (15 pts)
    • Check: No paraphrasing, exact parameter types, correct returns
  • ✓ Version numbers correct and consistent (5 pts)
  • ✓ URLs all valid and point to official sources (10 pts)
    • Test: Each URL should be from official domain

Citation (20 points)

  • ✓ Every API has source URL (10 pts)
  • ✓ Sources include version and section references (5 pts)
  • ✓ Confidence level stated and justified (5 pts)

Actionability (10 points)

  • ✓ Implementation checklist provided (5 pts)
  • ✓ Open questions identify real decisions (5 pts)

Passing Score: 80/100 or higher

Validation Process:

# Pseudo-code for validation logic
def validate_research_pack(research_pack):
    score = 0
    defects = []

    # Completeness checks
    if has_library_with_version(research_pack):
        score += 10
    else:
        defects.append("CRITICAL: Library/version not identified")

    api_count = count_documented_apis(research_pack)
    if api_count >= 3:
        score += 10
    elif api_count > 0:
        score += (api_count / 3) * 10
        defects.append(f"MINOR: Only {api_count} APIs documented, need 3+")
    else:
        defects.append("CRITICAL: No APIs documented")

    # ... (continue for all criteria)

    return {
        "score": score,
        "grade": "PASS" if score >= 80 else "FAIL",
        "defects": defects,
        "recommendations": generate_recommendations(defects)
    }

Output Format:

## 📊 ResearchPack Validation Report

**Overall Score**: [X]/100
**Grade**: [PASS ✅ / FAIL ❌]

### Breakdown
- Completeness: [X]/40
- Accuracy: [X]/30
- Citation: [X]/20
- Actionability: [X]/10

### Defects Found ([N])

#### CRITICAL (blocks implementation)
1. [Specific defect with example]
2. [Another defect]

#### MAJOR (should fix before proceeding)
1. [Defect]

#### MINOR (nice to have)
1. [Defect]

### Recommendations

**To reach passing score**:
1. [Specific action to take]
2. [Another action]

**If score >= 80**: ✅ **APPROVED** - Proceed to implementation-planner

**If score < 80**: ❌ **BLOCKED** - Fix critical/major defects and re-validate

1b. ResearchPack Validation - Philosophy Research Type

Purpose: Ensure philosophy/pattern research is well-organized, sourced, and actionable

Validation Rubric for Philosophy Research (100 points total, 70+ pass threshold):

Thematic Organization (30 points)

  • ✓ Clear themes/patterns identified with descriptive names (10 pts)
    • Check: Each theme has a clear title and scope
    • Examples: "Agent Architecture", "Context Engineering", "Multi-Agent Patterns"
  • ✓ Each theme well-documented with examples and evidence (10 pts)
    • Check: Themes have sub-sections, not just bullet points
    • Check: Examples or quotes support each theme
  • ✓ Cross-theme synthesis and relationships explained (10 pts)
    • Check: "How patterns connect" or "Synthesis" section present
    • Check: Explains how themes relate or build on each other

Source Quality (20 points)

  • ✓ Official/authoritative sources cited (10 pts)
    • Check: URLs from official domains (anthropic.com, docs.*, official repos)
    • Examples: Anthropic blog, official documentation, framework guides
  • ✓ Multiple sources per theme (5 pts)
    • Check: Each major theme cites 2+ sources
    • No single-source themes (indicates narrow research)
  • ✓ Date/version information when applicable (5 pts)
    • Check: Article dates, release versions, "as of [date]" present
    • Helps determine if research is current

Actionable Insights (30 points)

  • ✓ Implementation checklist provided (15 pts)
    • Check: Concrete next steps for applying research
    • Format: "Enhancement 1.1:", "Step 1:", "Action Items"
    • Examples: "Add think protocol to agents", "Create context-engineering skill"
  • ✓ Specific patterns extracted and documented (10 pts)
    • Check: Patterns section with clear pattern names
    • Check: Each pattern has description and when to use
    • Examples: "Pattern 1: Minimal Scaffolding", "Pattern 2: Think Before Act"
  • ✓ Open questions identified for planning phase (5 pts)
    • Check: Research acknowledges what's unknown or needs deciding
    • Examples: "Which agents need think tool?", "When to use multi-agent?"

Depth & Coverage (20 points)

  • ✓ Comprehensive coverage of topic (10 pts)
    • Check: Multiple aspects of topic covered
    • Check: Not surface-level (goes beyond basic definitions)
    • Examples: 7+ themes, 10+ sources for major topics
  • ✓ Sufficient detail for implementation (10 pts)
    • Check: Enough context to make decisions
    • Check: Includes performance metrics, tradeoffs, examples
    • Examples: "39% improvement", "15x cost", specific numbers

Passing Score: 70/100 or higher

Why Lower Threshold Than API Research?

Philosophy research is inherently more subjective and thematic. A well-organized thematic analysis with 7 patterns from 11 sources (like the Anthropic ResearchPack) deserves to pass even if it doesn't have "3+ API endpoints with exact signatures."

Philosophy research provides strategic value:

  • Informs how to build, not just what APIs to call
  • Establishes principles that apply across implementations
  • Captures institutional knowledge and best practices
  • Enables better decision-making during planning

Example: Anthropic Engineering Philosophy ResearchPack

Would score:

  • Thematic Organization: 30/30 (7 clear themes, cross-synthesis section)
  • Source Quality: 20/20 (11 official Anthropic articles, all dated)
  • Actionable Insights: 28/30 (Implementation checklist present, 7 patterns extracted, open questions listed)
  • Depth & Coverage: 18/20 (Comprehensive, but more examples would help)
  • Total: 96/100 ✅ PASS (well above 70 threshold)

Output Format:

## 📊 ResearchPack Validation Report (Philosophy Research)

**Overall Score**: [X]/100
**Grade**: [PASS ✅ / FAIL ❌]
**Research Type**: Philosophy/Pattern Research

### Breakdown

**Thematic Organization** ([X]/30):
- Clear themes: [Y/10] [✓/✗]
- Theme documentation: [Y/10] [✓/✗]
- Cross-synthesis: [Y/10] [✓/✗]

**Source Quality** ([X]/20):
- Official sources: [Y/10] [✓/✗]
- Multiple sources per theme: [Y/5] [✓/✗]
- Date/version info: [Y/5] [✓/✗]

**Actionable Insights** ([X]/30):
- Implementation checklist: [Y/15] [✓/✗]
- Patterns extracted: [Y/10] [✓/✗]
- Open questions: [Y/5] [✓/✗]

**Depth & Coverage** ([X]/20):
- Comprehensive coverage: [Y/10] [✓/✗]
- Sufficient detail: [Y/10] [✓/✗]

### Defects Found ([N])

#### CRITICAL (blocks implementation)
1. [Defect - if no themes identified, no patterns extracted, etc.]

#### MAJOR (should fix before proceeding)
1. [Defect - if only 1 source per theme, missing implementation checklist, etc.]

#### MINOR (nice to have)
1. [Defect - if some themes lack examples, could use more sources, etc.]

### Recommendations

**To reach passing score** (if < 70):
1. [Specific action to take]
2. [Another action]

**If score >= 70**: ✅ **APPROVED** - Proceed to implementation-planner

**If score < 70**: ❌ **BLOCKED** - Fix critical/major defects and re-validate

**Philosophy Research Note**: This research provides strategic guidance for implementation. Even if specific API details are needed later, the principles and patterns documented here are valuable for decision-making.

2. Implementation Plan Validation

Purpose: Ensure plan is complete, safe, and executable before coding

Validation Rubric (100 points total):

Completeness (35 points)

  • ✓ All file changes listed with purposes (10 pts)
  • ✓ Step-by-step implementation sequence (10 pts)
  • ✓ Each step has verification method (10 pts)
  • ✓ Test plan included (5 pts)

Safety (30 points)

  • ✓ Rollback plan complete and specific (15 pts)
    • Must include: exact commands, verification steps, triggers
  • ✓ Risk assessment done (10 pts)
    • At least 3 risks identified with mitigations
  • ✓ Changes are minimal (fewest files possible) (5 pts)

Clarity (20 points)

  • ✓ Steps are actionable (no ambiguity) (10 pts)
  • ✓ Success criteria defined (5 pts)
  • ✓ Time estimates provided (5 pts)

Alignment (15 points)

  • ✓ Plan matches ResearchPack APIs (10 pts)
  • ✓ Plan addresses all requirements from user (5 pts)

Passing Score: 85/100 or higher (higher bar than research)

Validation Process:

def validate_implementation_plan(plan, research_pack):
    score = 0
    defects = []

    # Completeness checks
    if has_file_changes_list(plan):
        score += 10
    else:
        defects.append("CRITICAL: No file changes specified")

    steps = extract_steps(plan)
    if all(step_has_verification(s) for s in steps):
        score += 10
    else:
        missing = [s for s in steps if not step_has_verification(s)]
        score += (len(steps) - len(missing)) / len(steps) * 10
        defects.append(f"MAJOR: Steps {missing} lack verification")

    # Safety checks
    rollback = extract_rollback_plan(plan)
    if has_exact_commands(rollback) and has_triggers(rollback):
        score += 15
    elif has_rollback_section(plan):
        score += 8
        defects.append("MAJOR: Rollback plan incomplete (missing commands or triggers)")
    else:
        defects.append("CRITICAL: No rollback plan")

    # Alignment checks
    apis_used = extract_apis_from_plan(plan)
    research_apis = extract_apis_from_research(research_pack)
    if all(api_matches_research(a, research_apis) for a in apis_used):
        score += 10
    else:
        mismatches = find_api_mismatches(apis_used, research_apis)
        defects.append(f"CRITICAL: APIs don't match ResearchPack: {mismatches}")

    # ... (continue for all criteria)

    return {
        "score": score,
        "grade": "PASS" if score >= 85 else "FAIL",
        "defects": defects,
        "recommendations": generate_recommendations(defects)
    }

Output Format:

## 📊 Implementation Plan Validation Report

**Overall Score**: [X]/100
**Grade**: [PASS ✅ / FAIL ❌]

### Breakdown
- Completeness: [X]/35
- Safety: [X]/30
- Clarity: [X]/20
- Alignment: [X]/15

### Defects Found ([N])

#### CRITICAL (blocks implementation)
1. [Specific defect]

#### MAJOR (should fix)
1. [Defect]

#### MINOR (nice to have)
1. [Defect]

### API Alignment Check
✅ All APIs match ResearchPack
OR
❌ Mismatches found:
- Plan uses `foo(x, y)` but ResearchPack shows `foo(x: string, y?: number)`

### Recommendations

**To reach passing score**:
1. [Action]

**If score >= 85**: ✅ **APPROVED** - Proceed to code-implementer

**If score < 85**: ❌ **BLOCKED** - Fix defects and re-validate

Quality Gate Protocol

Gates are MANDATORY checkpoints - cannot proceed to next phase without passing validation.

Gate 1: Research → Planning

Trigger: @docs-researcher completes ResearchPack
Action: Validate ResearchPack
Decision:
  - Score >= 80: ✅ Allow @implementation-planner to proceed
  - Score < 80: ❌ Block, return to @docs-researcher with defect list

Gate 2: Planning → Implementation

Trigger: @implementation-planner completes Implementation Plan
Action: Validate Implementation Plan + check alignment with ResearchPack
Decision:
  - Score >= 85 AND APIs match: ✅ Allow @code-implementer to proceed
  - Score < 85 OR APIs mismatch: ❌ Block, return to @implementation-planner with defect list

Gate 3: Implementation → Completion

Trigger: @code-implementer reports completion
Action: Validate tests passed, build succeeded, no regressions
Decision:
  - All checks pass: ✅ Mark complete
  - Any check fails: ❌ Trigger self-correction loop (up to 3 attempts)

Validation Automation

These validations should be automated via hooks (see hooks implementation):

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "implementation-planner",
        "command": "validate-research-pack.sh",
        "action": "block_if_fails"
      },
      {
        "matcher": "code-implementer",
        "command": "validate-implementation-plan.sh",
        "action": "block_if_fails"
      }
    ]
  }
}

Validation scripts return:

  • Exit code 0: Validation passed, proceed
  • Exit code 1: Validation failed, defects printed to stdout, block

Common Validation Failures

ResearchPack Failures

Hallucinated APIs:

❌ CRITICAL: API `redis.client.fetch()` not found in official docs
   ResearchPack cites: redis.io/docs/clients/nodejs
   Actual API: `client.get()` (verified at redis.io/docs/clients/nodejs#get)
   FIX: Replace all instances of `fetch` with correct `get` API

Version mismatch:

❌ MAJOR: ResearchPack uses v3.x docs but project has v4.6.0
   Example: v3 uses callbacks, v4 uses promises
   FIX: Re-fetch docs for v4.6.0 specifically

Missing citations:

❌ MAJOR: 5 APIs listed without source URLs
   APIs: set(), del(), ttl(), exists(), keys()
   FIX: Add source URL for each (format: docs.com/path#section)

Implementation Plan Failures

No rollback plan:

❌ CRITICAL: Rollback plan missing
   FIX: Add section "## 🔄 Rollback Plan" with:
   - Exact git commands to revert
   - Configuration restoration steps
   - Verification after rollback
   - Triggers for when to rollback

Ambiguous steps:

❌ MAJOR: Step 3 says "Update the service" (too vague)
   FIX: Specify:
   - Which service? (path/to/ServiceName.ts)
   - What update? (Add method X, modify method Y)
   - How to verify? (run `npm test path/to/test.ts`)

API misalignment:

❌ CRITICAL: Plan uses `client.fetch(key)` but ResearchPack shows `client.get(key)`
   FIX: Update plan to use correct API signature from ResearchPack

Performance Targets

  • Validation time: < 15 seconds per validation
  • Defect detection rate: 95%+ of major issues caught
  • False positive rate: < 5% (don't block good work)

Integration with Hooks

Hooks provide deterministic enforcement (always run, not LLM-dependent):

Research validation hook:

#!/bin/bash
# .claude/hooks/validate-research-pack.sh

RESEARCH_FILE="$1" # Path to ResearchPack file

# Check completeness
if ! grep -q "Target Library:" "$RESEARCH_FILE"; then
    echo "❌ CRITICAL: Library not identified"
    exit 1
fi

# Check API count
API_COUNT=$(grep -c "^###.*API" "$RESEARCH_FILE" || echo 0)
if [ "$API_COUNT" -lt 3 ]; then
    echo "❌ MINOR: Only $API_COUNT APIs documented, need 3+"
    # Don't block for this, just warn
fi

# Check citations
if ! grep -q "Source:" "$RESEARCH_FILE"; then
    echo "❌ CRITICAL: No source citations found"
    exit 1
fi

echo "✅ ResearchPack validation passed (score: [calculated]/100)"
exit 0

Plan validation hook (similar structure).


This skill ensures quality gates are objective, automated, and enforce the Research → Plan → Implement workflow deterministically.