| name | quality-check |
| description | Automated quality evaluation for code and documentation outputs. Use when reviewing code quality, running self-critique loops, evaluating substantial outputs, or when user mentions quality, review, critique, evaluate, check, validate, or score. |
Quality Check Skill
Automated quality evaluation for code and documentation outputs
Overview
Provides structured quality evaluation workflows for code reviews and self-critique, ensuring outputs meet quality standards before delivery.
Core capabilities:
- Code review with systematic evaluation flow
- Self-critique loop for substantial outputs
- Automated scoring against quality dimensions
- Integration with PRAGMATIC quality framework
- Template-driven review processes
When to Use This Skill
Auto-invoked when:
- Before commit (for code quality gate)
- After substantial output produced (>200 lines code/docs)
- User mentions "review", "check quality", "validate"
- Before PR creation
Manual invocation:
- Mid-development quality checks
- Documentation review
- Architecture decision evaluation
- Output refinement loops
Capabilities
1. Code Review Flow
Systematic evaluation of code changes before commit:
python scripts/code_review_flow.py --files <changed-files>
Evaluation steps:
- Context - Understand requirements and scope
- Critical Path - Inspect main execution path logic
- Edge Cases - Check error handling, null paths, config changes
- Tests - Verify test coverage for new behavior
- Quality - Check clarity, naming, architecture alignment
Output format:
Findings:
1. [Severity] file:line – issue summary + impact + fix suggestion
Questions:
- Clarifications needed
Next Steps:
- Actions before approval
2. Self-Critique Loop
Iterative quality improvement for substantial outputs:
python scripts/self_critique.py --input <output-file> --criteria clarity,completeness,actionability
Workflow:
- Set criteria (Clarity, Completeness, Actionability, Accuracy, Relevance)
- Score 1-10 (anything <8 triggers improvement)
- List concrete fixes
- Revise and rescore
- Stop when all scores ≥8
Benefits:
- Catches issues before user sees them
- Systematic improvement process
- Prevents endless polishing (2-3 cycles typical)
3. Quality Scoring
Automated scoring against quality dimensions:
python scripts/score_output.py --input <file> --dimensions clarity,completeness,actionability,accuracy,relevance
Scoring dimensions:
- Clarity (1-10): Easy to understand, well-structured, jargon-free
- Completeness (1-10): Addresses all requirements, no gaps
- Actionability (1-10): Clear next steps, specific guidance
- Accuracy (1-10): Factually correct, properly researched
- Relevance (1-10): Focused on user needs, no tangents
Threshold: ≥8/10 on all dimensions = ready for delivery
Workflows
Workflow 1: Pre-Commit Code Review
Pattern: Systematic quality gate before commits
trigger: before-commit
workflow:
1. Detect uncommitted changes
2. Run code_review_flow.py
3. Evaluate:
- Critical path logic
- Edge case handling
- Test coverage
- Code quality
4. If blockers found:
- Present to user
- Block commit
5. If approved:
- Proceed to commit
Integration with /commit:
Quality gates delegation pattern:
/commit (orchestration):
- Classifies changes (docs/config/code/mixed)
- Determines if quality gates needed
- Delegates to language skill
Language skill (python/resources/quality-gates.md):
- Executes domain-specific checks
- ruff → mypy → pytest (Python)
- Captures evidence
- Returns pass/fail
This skill (quality-check):
- Provides evidence format standards
- Scoring rubrics for quality dimensions
- Self-critique workflows for substantial outputs
Evidence format (consistent across all):
✓ Linting: All checks passed (ruff)
✓ Type checking: No issues found (mypy)
✓ Tests: 10 passed in 2.5s (pytest)
Separation of concerns:
- commit.md: Workflow orchestration
- python skill: Python-specific quality checks
- quality-check skill: Evidence standards + scoring frameworks
Workflow 2: Self-Critique After Output
Pattern: Iterative improvement of substantial outputs
trigger: substantial-output-produced
workflow:
1. Detect output size (>200 lines)
2. Run self_critique.py with standard criteria
3. Score output (1-10 per dimension)
4. If any score <8:
- List specific fixes
- Apply improvements
- Rescore
5. Repeat until all scores ≥8
6. Deliver refined output
Example criteria:
- Documentation: Clarity, Completeness, Examples, Accuracy
- Code: Correctness, Maintainability, Test Coverage, Performance
- Architecture: Scalability, Reversibility, Orthogonality
Workflow 3: Documentation Quality Check
Pattern: Validate documentation before publishing
trigger: documentation-complete
workflow:
1. Run score_output.py with doc criteria
2. Check:
- Clarity (well-structured, readable)
- Completeness (all sections present)
- Accuracy (verified facts, current info)
- Examples (code samples work)
- Links (no broken references)
3. If score <8:
- Fix identified issues
- Re-run scoring
4. If score ≥8:
- Approve for publishing
Evidence Format Standards
Display format (consistent across commits/PRs):
✓ Linting: All checks passed (ruff/eslint) ✓ Type checking: No issues found (mypy) ✓ Tests: 10 passed in 2.5s (pytest/jest) ✓ Build: Success (0 errors)
Symbol: Use ✓ (text checkmark U+2713), NOT ✅ (emoji checkmark)
Pattern: ✓ Check: Result (tool)
Why:
- Consistent with NO emoji rule (pragmatic/agent-output-standards.md)
- Plain text (accessible, parseable by scripts)
- Professional tone
- Aligns with conventional commit standards
Include in:
- Commit messages (after running quality checks)
- PR descriptions (show validation passed)
- Validation reports (consistent format)
Why this matters:
- Verifiable outcomes (not "looks good")
- Builds confidence through evidence
- Consistent across all quality gates
- Enables automation (parseable format)
Scripts Reference
scripts/code_review_flow.py
Purpose: Systematic code review with structured output
Usage:
# Review uncommitted files
python scripts/code_review_flow.py --auto
# Review specific files
python scripts/code_review_flow.py --files src/auth/*.ts
# With custom review template
python scripts/code_review_flow.py --template resources/review-template.yaml
Options:
--auto: Auto-detect changed files--files: Specific file patterns--template: Custom review template--output: Save findings to file
Review checklist:
- Requirements understood
- Critical path logic correct
- Edge cases handled
- Tests cover new behavior
- Code quality (naming, clarity, architecture)
scripts/self_critique.py
Purpose: Iterative quality improvement loop
Usage:
# Auto-critique last output
python scripts/self_critique.py --auto
# Critique specific file
python scripts/self_critique.py --input output.md
# Custom criteria
python scripts/self_critique.py --input doc.md --criteria clarity,examples,accuracy
# Max iterations
python scripts/self_critique.py --input doc.md --max-iterations 3
Options:
--auto: Critique last substantial output--input: File to critique--criteria: Quality dimensions (comma-separated)--max-iterations: Prevent endless polishing (default: 3)--threshold: Minimum score to pass (default: 8)
Output:
Self-Evaluation (Iteration 1):
- Clarity: 7/10 (needs better structure)
- Completeness: 9/10 (good)
- Actionability: 6/10 (missing next steps)
- Accuracy: 10/10 (verified)
- Relevance: 8/10 (focused)
Improvements needed:
1. Add section headings for better structure
2. Include explicit "Next Steps" section
3. Add code examples for clarity
[Apply fixes...]
Self-Evaluation (Iteration 2):
- Clarity: 9/10 (improved)
- Completeness: 9/10 (good)
- Actionability: 9/10 (fixed)
- Accuracy: 10/10 (verified)
- Relevance: 8/10 (focused)
✓ All scores ≥8. Ready for delivery.
scripts/score_output.py
Purpose: Score output against quality dimensions
Usage:
# Score with default dimensions
python scripts/score_output.py --input output.md
# Custom dimensions
python scripts/score_output.py --input code.py --dimensions correctness,maintainability,performance
# Output as JSON
python scripts/score_output.py --input doc.md --format json
# Compare before/after
python scripts/score_output.py --input doc-v1.md --compare doc-v2.md
Options:
--input: File to score--dimensions: Quality dimensions to evaluate--format: Output format (text, json, yaml)--compare: Compare scores with another file--rubric: Custom scoring rubric
Output:
{
"file": "output.md",
"overall_score": 8.4,
"dimensions": {
"clarity": {
"score": 9,
"notes": "Well-structured with clear headings"
},
"completeness": {
"score": 8,
"notes": "Covers all requirements"
},
"actionability": {
"score": 8,
"notes": "Clear next steps provided"
},
"accuracy": {
"score": 9,
"notes": "Facts verified"
},
"relevance": {
"score": 8,
"notes": "Focused on user needs"
}
},
"pass": true,
"threshold": 8
}
Resources
resources/review-template.yaml
Code review template with evaluation criteria:
review_template:
context:
- "Understand requirements and scope"
- "Review PR description or ticket"
- "Identify linked issues or dependencies"
critical_path:
- "Inspect main execution path first"
- "Verify logic correctness"
- "Check for obvious bugs"
edge_cases:
- "Error handling for failures"
- "Null/undefined checks"
- "Configuration changes impact"
- "Boundary conditions"
tests:
- "Existing tests cover new behavior"
- "Edge cases have tests"
- "Tests are maintainable"
quality:
- "Clear variable/function names"
- "Follows project conventions"
- "Architecture alignment (PRAGMATIC principles)"
- "No duplication or magic values"
resources/scoring-rubric.yaml
Quality scoring rubric for different output types:
# Code Quality Rubric
code:
correctness:
10: "Logic correct, all edge cases handled"
8: "Core logic correct, minor edge cases missed"
6: "Logic correct, major edge cases missed"
4: "Logic flawed but fixable"
2: "Fundamental logic errors"
maintainability:
10: "Clear, well-structured, follows conventions"
8: "Generally clear, minor inconsistencies"
6: "Understandable but needs improvement"
4: "Confusing structure or naming"
2: "Unmaintainable"
test_coverage:
10: "Comprehensive tests including edge cases"
8: "Good coverage, minor gaps"
6: "Basic coverage, significant gaps"
4: "Minimal testing"
2: "No tests"
# Documentation Quality Rubric
documentation:
clarity:
10: "Crystal clear, well-structured, scannable"
8: "Clear with minor structure improvements possible"
6: "Understandable but dense or poorly organized"
4: "Confusing structure, hard to follow"
2: "Incomprehensible"
completeness:
10: "All sections present, nothing missing"
8: "Core content complete, minor gaps"
6: "Significant sections missing"
4: "Major gaps"
2: "Skeleton only"
examples:
10: "Comprehensive, working examples for all use cases"
8: "Good examples, minor gaps"
6: "Basic examples, significant gaps"
4: "Minimal examples"
2: "No examples"
accuracy:
10: "All facts verified, up-to-date"
8: "Generally accurate, minor errors"
6: "Some inaccuracies"
4: "Multiple errors"
2: "Fundamentally incorrect"
# Architecture Decision Rubric
architecture:
scalability:
10: "Scales seamlessly, proven patterns"
8: "Scales well, minor concerns"
6: "Scalability concerns exist"
4: "Poor scalability"
2: "Will not scale"
reversibility:
10: "Fully reversible, clear exit path"
8: "Mostly reversible"
6: "Some irreversible aspects"
4: "Difficult to reverse"
2: "Irreversible"
orthogonality:
10: "Fully decoupled, independent concerns"
8: "Well-separated with minor coupling"
6: "Some coupling concerns"
4: "Significant coupling"
2: "Tightly coupled"
Integration with Optimal Workflow
This Skill integrates with the optimal workflow pattern:
Quality Gate Pattern (from optimal-workflow.md):
Session → Scope → Execute → **quality-check** → commit → Notes
↑
(Auto-invoked here)
Code Review Before Commit (MANDATORY):
Before /commit:
1. quality-check Skill auto-invokes
2. Run code_review_flow.py
3. Evaluate quality dimensions
4. Block if critical issues found
5. Approve if standards met
Auto-Detection Triggers
This Skill loads automatically when:
Before commit:
- User runs
/commit - Uncommitted changes detected
- User runs
After substantial output:
- Output produced >200 lines
- User completes major deliverable
Quality keywords:
- "review quality"
- "check this"
- "how does this look"
- "critique my work"
- "validate output"
Documentation finalization:
- User indicates docs are "done"
- PR description written
Best Practices
Do's:
- ✅ Always run code review before committing
- ✅ Set meaningful criteria for self-critique
- ✅ Stop after 2-3 iterations (avoid endless polishing)
- ✅ Focus improvements on low-scoring dimensions
- ✅ Document quality standards in project rubric
Don'ts:
- ❌ Skip review for "quick fixes" (when bugs sneak in)
- ❌ Polish endlessly (diminishing returns after 2-3 cycles)
- ❌ Use generic criteria (customize for context)
- ❌ Ignore scores <8 without addressing them
- ❌ Apply fixes without understanding root cause
Examples
Example 1: Pre-Commit Code Review (Auto-Invoke)
Scenario: User runs /commit, quality-check auto-invokes
# Auto-invoked by /commit
python scripts/code_review_flow.py --auto
# Output:
Code Review Results:
Findings:
1. [HIGH] src/auth/login.ts:42 – Missing error handling for async token refresh
Impact: Users see unhandled promise rejection
Fix: Add try-catch around token refresh call
2. [MEDIUM] src/utils/cache.ts:78 – Magic number 3600 (cache TTL)
Impact: Hard to configure, buried in code
Fix: Extract to config constant CACHE_TTL_SECONDS
Questions:
- Should token refresh retry on failure?
Next Steps:
- Fix HIGH finding before commit
- Consider config extraction for MEDIUM finding
- Clarify retry behavior
BLOCKED: Resolve HIGH findings before committing.
Example 2: Self-Critique Loop
Scenario: Documentation written, needs quality check
python scripts/self_critique.py --input ARCHITECTURE.md --criteria clarity,completeness,examples,accuracy
# Iteration 1:
Self-Evaluation:
- Clarity: 7/10 (needs better structure)
- Completeness: 8/10 (good)
- Examples: 5/10 (missing code samples)
- Accuracy: 9/10 (facts verified)
Improvements:
1. Add section headings for better scan-ability
2. Include code examples for each API
3. Add diagram for architecture overview
[Apply fixes...]
# Iteration 2:
Self-Evaluation:
- Clarity: 9/10 (much improved)
- Completeness: 8/10 (good)
- Examples: 8/10 (added samples)
- Accuracy: 9/10 (verified)
✓ All scores ≥8. Ready for publishing.
Example 3: Quality Scoring Comparison
Scenario: Compare quality before/after refactoring
python scripts/score_output.py --input src/api/v1.ts --compare src/api/v2.ts
# Output:
Quality Score Comparison:
src/api/v1.ts:
Correctness: 6/10
Maintainability: 5/10
Test Coverage: 7/10
Performance: 6/10
Overall: 6.0/10
src/api/v2.ts:
Correctness: 9/10 (+3)
Maintainability: 9/10 (+4)
Test Coverage: 8/10 (+1)
Performance: 8/10 (+2)
Overall: 8.5/10 (+2.5)
✓ Refactoring improved quality significantly.
Troubleshooting
Issue: Skill not auto-invoking before commit
- Check:
/commitcommand includes quality-check integration - Verify: Quality-check listed in SKILL.md trigger-keywords
- Debug: Manually run
python scripts/code_review_flow.py --auto
Issue: Self-critique stuck in endless loop
- Check: Using
--max-iterationsflag - Adjust: Lower threshold or adjust criteria
- Solution: Accept "good enough" after 2-3 iterations
Issue: Scores seem arbitrary
- Check: Review
resources/scoring-rubric.yamlfor definitions - Customize: Adjust rubric for project-specific standards
- Verify: Scoring aligns with team quality expectations
Related
Skills:
coderabbit- Expert code review automationcode-reviewer(agent) - Comprehensive review with project contextgit- Commit message validation
Commands:
/commit- Auto-invokes this Skill for quality gate/pr-review- PR review workflow
Framework:
quality-check/SKILL.md § Quality Gate Pattern- Quality gate patterns (Step 4)PRAGMATIC.md- Quality framework principles
Related Skills
introspection: For reasoning validation and cognitive error detection. Quality-check validates output artifacts, introspection validates reasoning processes.
performance: For runtime performance profiling. Quality-check gates code quality before commit, performance debugs execution bottlenecks.
Skill auto-loads via progressive disclosure - only appears when quality checks needed.