Claude Code Plugins

Community-maintained marketplace

Feedback

quality-check

@peterkc/acf
1
0

Automated quality evaluation for code and documentation outputs. Use when reviewing code quality, running self-critique loops, evaluating substantial outputs, or when user mentions quality, review, critique, evaluate, check, validate, or score.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name quality-check
description Automated quality evaluation for code and documentation outputs. Use when reviewing code quality, running self-critique loops, evaluating substantial outputs, or when user mentions quality, review, critique, evaluate, check, validate, or score.

Quality Check Skill

Automated quality evaluation for code and documentation outputs

Overview

Provides structured quality evaluation workflows for code reviews and self-critique, ensuring outputs meet quality standards before delivery.

Core capabilities:

  • Code review with systematic evaluation flow
  • Self-critique loop for substantial outputs
  • Automated scoring against quality dimensions
  • Integration with PRAGMATIC quality framework
  • Template-driven review processes

When to Use This Skill

Auto-invoked when:

  • Before commit (for code quality gate)
  • After substantial output produced (>200 lines code/docs)
  • User mentions "review", "check quality", "validate"
  • Before PR creation

Manual invocation:

  • Mid-development quality checks
  • Documentation review
  • Architecture decision evaluation
  • Output refinement loops

Capabilities

1. Code Review Flow

Systematic evaluation of code changes before commit:

python scripts/code_review_flow.py --files <changed-files>

Evaluation steps:

  1. Context - Understand requirements and scope
  2. Critical Path - Inspect main execution path logic
  3. Edge Cases - Check error handling, null paths, config changes
  4. Tests - Verify test coverage for new behavior
  5. Quality - Check clarity, naming, architecture alignment

Output format:

Findings:
1. [Severity] file:line – issue summary + impact + fix suggestion

Questions:
- Clarifications needed

Next Steps:
- Actions before approval

2. Self-Critique Loop

Iterative quality improvement for substantial outputs:

python scripts/self_critique.py --input <output-file> --criteria clarity,completeness,actionability

Workflow:

  1. Set criteria (Clarity, Completeness, Actionability, Accuracy, Relevance)
  2. Score 1-10 (anything <8 triggers improvement)
  3. List concrete fixes
  4. Revise and rescore
  5. Stop when all scores ≥8

Benefits:

  • Catches issues before user sees them
  • Systematic improvement process
  • Prevents endless polishing (2-3 cycles typical)

3. Quality Scoring

Automated scoring against quality dimensions:

python scripts/score_output.py --input <file> --dimensions clarity,completeness,actionability,accuracy,relevance

Scoring dimensions:

  • Clarity (1-10): Easy to understand, well-structured, jargon-free
  • Completeness (1-10): Addresses all requirements, no gaps
  • Actionability (1-10): Clear next steps, specific guidance
  • Accuracy (1-10): Factually correct, properly researched
  • Relevance (1-10): Focused on user needs, no tangents

Threshold: ≥8/10 on all dimensions = ready for delivery

Workflows

Workflow 1: Pre-Commit Code Review

Pattern: Systematic quality gate before commits

trigger: before-commit
workflow:
  1. Detect uncommitted changes
  2. Run code_review_flow.py
  3. Evaluate:
       - Critical path logic
       - Edge case handling
       - Test coverage
       - Code quality
  4. If blockers found:
       - Present to user
       - Block commit
  5. If approved:
       - Proceed to commit

Integration with /commit:

Quality gates delegation pattern:

  1. /commit (orchestration):

    • Classifies changes (docs/config/code/mixed)
    • Determines if quality gates needed
    • Delegates to language skill
  2. Language skill (python/resources/quality-gates.md):

    • Executes domain-specific checks
    • ruff → mypy → pytest (Python)
    • Captures evidence
    • Returns pass/fail
  3. This skill (quality-check):

    • Provides evidence format standards
    • Scoring rubrics for quality dimensions
    • Self-critique workflows for substantial outputs

Evidence format (consistent across all):

✓ Linting: All checks passed (ruff)
✓ Type checking: No issues found (mypy)
✓ Tests: 10 passed in 2.5s (pytest)

Separation of concerns:

  • commit.md: Workflow orchestration
  • python skill: Python-specific quality checks
  • quality-check skill: Evidence standards + scoring frameworks

Workflow 2: Self-Critique After Output

Pattern: Iterative improvement of substantial outputs

trigger: substantial-output-produced
workflow:
  1. Detect output size (>200 lines)
  2. Run self_critique.py with standard criteria
  3. Score output (1-10 per dimension)
  4. If any score <8:
       - List specific fixes
       - Apply improvements
       - Rescore
  5. Repeat until all scores ≥8
  6. Deliver refined output

Example criteria:

  • Documentation: Clarity, Completeness, Examples, Accuracy
  • Code: Correctness, Maintainability, Test Coverage, Performance
  • Architecture: Scalability, Reversibility, Orthogonality

Workflow 3: Documentation Quality Check

Pattern: Validate documentation before publishing

trigger: documentation-complete
workflow:
  1. Run score_output.py with doc criteria
  2. Check:
       - Clarity (well-structured, readable)
       - Completeness (all sections present)
       - Accuracy (verified facts, current info)
       - Examples (code samples work)
       - Links (no broken references)
  3. If score <8:
       - Fix identified issues
       - Re-run scoring
  4. If score ≥8:
       - Approve for publishing

Evidence Format Standards

Display format (consistent across commits/PRs):

✓ Linting: All checks passed (ruff/eslint) ✓ Type checking: No issues found (mypy) ✓ Tests: 10 passed in 2.5s (pytest/jest) ✓ Build: Success (0 errors)

Symbol: Use ✓ (text checkmark U+2713), NOT ✅ (emoji checkmark)

Pattern: ✓ Check: Result (tool)

Why:

  • Consistent with NO emoji rule (pragmatic/agent-output-standards.md)
  • Plain text (accessible, parseable by scripts)
  • Professional tone
  • Aligns with conventional commit standards

Include in:

  • Commit messages (after running quality checks)
  • PR descriptions (show validation passed)
  • Validation reports (consistent format)

Why this matters:

  • Verifiable outcomes (not "looks good")
  • Builds confidence through evidence
  • Consistent across all quality gates
  • Enables automation (parseable format)

Scripts Reference

scripts/code_review_flow.py

Purpose: Systematic code review with structured output

Usage:

# Review uncommitted files
python scripts/code_review_flow.py --auto

# Review specific files
python scripts/code_review_flow.py --files src/auth/*.ts

# With custom review template
python scripts/code_review_flow.py --template resources/review-template.yaml

Options:

  • --auto: Auto-detect changed files
  • --files: Specific file patterns
  • --template: Custom review template
  • --output: Save findings to file

Review checklist:

  • Requirements understood
  • Critical path logic correct
  • Edge cases handled
  • Tests cover new behavior
  • Code quality (naming, clarity, architecture)

scripts/self_critique.py

Purpose: Iterative quality improvement loop

Usage:

# Auto-critique last output
python scripts/self_critique.py --auto

# Critique specific file
python scripts/self_critique.py --input output.md

# Custom criteria
python scripts/self_critique.py --input doc.md --criteria clarity,examples,accuracy

# Max iterations
python scripts/self_critique.py --input doc.md --max-iterations 3

Options:

  • --auto: Critique last substantial output
  • --input: File to critique
  • --criteria: Quality dimensions (comma-separated)
  • --max-iterations: Prevent endless polishing (default: 3)
  • --threshold: Minimum score to pass (default: 8)

Output:

Self-Evaluation (Iteration 1):
- Clarity: 7/10 (needs better structure)
- Completeness: 9/10 (good)
- Actionability: 6/10 (missing next steps)
- Accuracy: 10/10 (verified)
- Relevance: 8/10 (focused)

Improvements needed:
1. Add section headings for better structure
2. Include explicit "Next Steps" section
3. Add code examples for clarity

[Apply fixes...]

Self-Evaluation (Iteration 2):
- Clarity: 9/10 (improved)
- Completeness: 9/10 (good)
- Actionability: 9/10 (fixed)
- Accuracy: 10/10 (verified)
- Relevance: 8/10 (focused)

✓ All scores ≥8. Ready for delivery.

scripts/score_output.py

Purpose: Score output against quality dimensions

Usage:

# Score with default dimensions
python scripts/score_output.py --input output.md

# Custom dimensions
python scripts/score_output.py --input code.py --dimensions correctness,maintainability,performance

# Output as JSON
python scripts/score_output.py --input doc.md --format json

# Compare before/after
python scripts/score_output.py --input doc-v1.md --compare doc-v2.md

Options:

  • --input: File to score
  • --dimensions: Quality dimensions to evaluate
  • --format: Output format (text, json, yaml)
  • --compare: Compare scores with another file
  • --rubric: Custom scoring rubric

Output:

{
  "file": "output.md",
  "overall_score": 8.4,
  "dimensions": {
    "clarity": {
      "score": 9,
      "notes": "Well-structured with clear headings"
    },
    "completeness": {
      "score": 8,
      "notes": "Covers all requirements"
    },
    "actionability": {
      "score": 8,
      "notes": "Clear next steps provided"
    },
    "accuracy": {
      "score": 9,
      "notes": "Facts verified"
    },
    "relevance": {
      "score": 8,
      "notes": "Focused on user needs"
    }
  },
  "pass": true,
  "threshold": 8
}

Resources

resources/review-template.yaml

Code review template with evaluation criteria:

review_template:
  context:
    - "Understand requirements and scope"
    - "Review PR description or ticket"
    - "Identify linked issues or dependencies"

  critical_path:
    - "Inspect main execution path first"
    - "Verify logic correctness"
    - "Check for obvious bugs"

  edge_cases:
    - "Error handling for failures"
    - "Null/undefined checks"
    - "Configuration changes impact"
    - "Boundary conditions"

  tests:
    - "Existing tests cover new behavior"
    - "Edge cases have tests"
    - "Tests are maintainable"

  quality:
    - "Clear variable/function names"
    - "Follows project conventions"
    - "Architecture alignment (PRAGMATIC principles)"
    - "No duplication or magic values"

resources/scoring-rubric.yaml

Quality scoring rubric for different output types:

# Code Quality Rubric
code:
  correctness:
    10: "Logic correct, all edge cases handled"
    8: "Core logic correct, minor edge cases missed"
    6: "Logic correct, major edge cases missed"
    4: "Logic flawed but fixable"
    2: "Fundamental logic errors"

  maintainability:
    10: "Clear, well-structured, follows conventions"
    8: "Generally clear, minor inconsistencies"
    6: "Understandable but needs improvement"
    4: "Confusing structure or naming"
    2: "Unmaintainable"

  test_coverage:
    10: "Comprehensive tests including edge cases"
    8: "Good coverage, minor gaps"
    6: "Basic coverage, significant gaps"
    4: "Minimal testing"
    2: "No tests"

# Documentation Quality Rubric
documentation:
  clarity:
    10: "Crystal clear, well-structured, scannable"
    8: "Clear with minor structure improvements possible"
    6: "Understandable but dense or poorly organized"
    4: "Confusing structure, hard to follow"
    2: "Incomprehensible"

  completeness:
    10: "All sections present, nothing missing"
    8: "Core content complete, minor gaps"
    6: "Significant sections missing"
    4: "Major gaps"
    2: "Skeleton only"

  examples:
    10: "Comprehensive, working examples for all use cases"
    8: "Good examples, minor gaps"
    6: "Basic examples, significant gaps"
    4: "Minimal examples"
    2: "No examples"

  accuracy:
    10: "All facts verified, up-to-date"
    8: "Generally accurate, minor errors"
    6: "Some inaccuracies"
    4: "Multiple errors"
    2: "Fundamentally incorrect"

# Architecture Decision Rubric
architecture:
  scalability:
    10: "Scales seamlessly, proven patterns"
    8: "Scales well, minor concerns"
    6: "Scalability concerns exist"
    4: "Poor scalability"
    2: "Will not scale"

  reversibility:
    10: "Fully reversible, clear exit path"
    8: "Mostly reversible"
    6: "Some irreversible aspects"
    4: "Difficult to reverse"
    2: "Irreversible"

  orthogonality:
    10: "Fully decoupled, independent concerns"
    8: "Well-separated with minor coupling"
    6: "Some coupling concerns"
    4: "Significant coupling"
    2: "Tightly coupled"

Integration with Optimal Workflow

This Skill integrates with the optimal workflow pattern:

Quality Gate Pattern (from optimal-workflow.md):

Session → Scope → Execute → **quality-check** → commit → Notes
                                    ↑
                              (Auto-invoked here)

Code Review Before Commit (MANDATORY):

Before /commit:
  1. quality-check Skill auto-invokes
  2. Run code_review_flow.py
  3. Evaluate quality dimensions
  4. Block if critical issues found
  5. Approve if standards met

Auto-Detection Triggers

This Skill loads automatically when:

  1. Before commit:

    • User runs /commit
    • Uncommitted changes detected
  2. After substantial output:

    • Output produced >200 lines
    • User completes major deliverable
  3. Quality keywords:

    • "review quality"
    • "check this"
    • "how does this look"
    • "critique my work"
    • "validate output"
  4. Documentation finalization:

    • User indicates docs are "done"
    • PR description written

Best Practices

Do's:

  • ✅ Always run code review before committing
  • ✅ Set meaningful criteria for self-critique
  • ✅ Stop after 2-3 iterations (avoid endless polishing)
  • ✅ Focus improvements on low-scoring dimensions
  • ✅ Document quality standards in project rubric

Don'ts:

  • ❌ Skip review for "quick fixes" (when bugs sneak in)
  • ❌ Polish endlessly (diminishing returns after 2-3 cycles)
  • ❌ Use generic criteria (customize for context)
  • ❌ Ignore scores <8 without addressing them
  • ❌ Apply fixes without understanding root cause

Examples

Example 1: Pre-Commit Code Review (Auto-Invoke)

Scenario: User runs /commit, quality-check auto-invokes

# Auto-invoked by /commit
python scripts/code_review_flow.py --auto

# Output:
Code Review Results:

Findings:
1. [HIGH] src/auth/login.ts:42 – Missing error handling for async token refresh
   Impact: Users see unhandled promise rejection
   Fix: Add try-catch around token refresh call

2. [MEDIUM] src/utils/cache.ts:78 – Magic number 3600 (cache TTL)
   Impact: Hard to configure, buried in code
   Fix: Extract to config constant CACHE_TTL_SECONDS

Questions:
- Should token refresh retry on failure?

Next Steps:
- Fix HIGH finding before commit
- Consider config extraction for MEDIUM finding
- Clarify retry behavior

BLOCKED: Resolve HIGH findings before committing.

Example 2: Self-Critique Loop

Scenario: Documentation written, needs quality check

python scripts/self_critique.py --input ARCHITECTURE.md --criteria clarity,completeness,examples,accuracy

# Iteration 1:
Self-Evaluation:
- Clarity: 7/10 (needs better structure)
- Completeness: 8/10 (good)
- Examples: 5/10 (missing code samples)
- Accuracy: 9/10 (facts verified)

Improvements:
1. Add section headings for better scan-ability
2. Include code examples for each API
3. Add diagram for architecture overview

[Apply fixes...]

# Iteration 2:
Self-Evaluation:
- Clarity: 9/10 (much improved)
- Completeness: 8/10 (good)
- Examples: 8/10 (added samples)
- Accuracy: 9/10 (verified)

✓ All scores ≥8. Ready for publishing.

Example 3: Quality Scoring Comparison

Scenario: Compare quality before/after refactoring

python scripts/score_output.py --input src/api/v1.ts --compare src/api/v2.ts

# Output:
Quality Score Comparison:

src/api/v1.ts:
  Correctness: 6/10
  Maintainability: 5/10
  Test Coverage: 7/10
  Performance: 6/10
  Overall: 6.0/10

src/api/v2.ts:
  Correctness: 9/10 (+3)
  Maintainability: 9/10 (+4)
  Test Coverage: 8/10 (+1)
  Performance: 8/10 (+2)
  Overall: 8.5/10 (+2.5)

✓ Refactoring improved quality significantly.

Troubleshooting

Issue: Skill not auto-invoking before commit

  • Check: /commit command includes quality-check integration
  • Verify: Quality-check listed in SKILL.md trigger-keywords
  • Debug: Manually run python scripts/code_review_flow.py --auto

Issue: Self-critique stuck in endless loop

  • Check: Using --max-iterations flag
  • Adjust: Lower threshold or adjust criteria
  • Solution: Accept "good enough" after 2-3 iterations

Issue: Scores seem arbitrary

  • Check: Review resources/scoring-rubric.yaml for definitions
  • Customize: Adjust rubric for project-specific standards
  • Verify: Scoring aligns with team quality expectations

Related

Skills:

  • coderabbit - Expert code review automation
  • code-reviewer (agent) - Comprehensive review with project context
  • git - Commit message validation

Commands:

  • /commit - Auto-invokes this Skill for quality gate
  • /pr-review - PR review workflow

Framework:

  • quality-check/SKILL.md § Quality Gate Pattern - Quality gate patterns (Step 4)
  • PRAGMATIC.md - Quality framework principles

Related Skills

introspection: For reasoning validation and cognitive error detection. Quality-check validates output artifacts, introspection validates reasoning processes.

performance: For runtime performance profiling. Quality-check gates code quality before commit, performance debugs execution bottlenecks.


Skill auto-loads via progressive disclosure - only appears when quality checks needed.