| name | Orchestration QA |
| description | Quality assurance for orchestration workflows - validates Skills and Subagents follow documented patterns, tracks deviations, suggests improvements |
Orchestration QA Skill
Overview
This skill provides quality assurance for Task Orchestrator workflows by validating that Skills and Subagents follow their documented patterns, detecting deviations, and suggesting continuous improvements.
Key Capabilities:
- Interactive configuration - User chooses which analyses to enable (token efficiency)
- Pre-execution validation - Context capture, checkpoint setting
- Post-execution review - Workflow adherence, output validation
- Specialized quality analysis - Execution graphs, tag coverage, information density
- Efficiency analysis - Token optimization, tool selection, parallelization
- Deviation reporting - Structured findings with severity (ALERT/WARN/INFO)
- Pattern tracking - Continuous improvement suggestions
Philosophy:
- ✅ User-driven configuration - Pay token costs only for analyses you want
- ✅ Observe and validate - Never blocks execution
- ✅ Report transparently - Clear severity levels (ALERT/WARN/INFO)
- ✅ Learn from patterns - Track issues, suggest improvements
- ✅ Progressive loading - Load only analysis needed for context
- ❌ Not a blocker - Warns about issues, doesn't stop workflows
- ❌ Not auto-fix - Asks user for decisions on deviations
When to Use This Skill
Interactive Configuration (FIRST TIME)
Trigger: First time using orchestration-qa in a session, or when user wants to change settings Action: Ask user which analysis categories to enable (multiselect interface) Output: Configuration stored in session, used for all subsequent reviews User Value: Only pay token costs for analyses you actually want
Session Initialization
Trigger: After configuration, at start of orchestration session Action: Load knowledge bases (Skills, Subagents, routing config) based on enabled categories Output: Initialization status with active configuration, ready signal
Pre-Execution Validation
Triggers:
- "Create feature for X" (before Feature Orchestration Skill or Feature Architect)
- "Execute tasks" (before Task Orchestration Skill)
- "Mark complete" (before Status Progression Skill)
- Before launching any Skill or Subagent
Action: Capture context, set validation checkpoints Output: Stored context for post-execution comparison
Post-Execution Review
Triggers:
- After any Skill completes
- After any Subagent returns
- User asks: "Review quality", "Show QA results", "Any issues?"
Action: Validate workflow adherence, analyze quality, detect deviations Output: Structured quality report with findings and recommendations
Parameters
{
phase: "init" | "pre" | "post" | "configure",
// For pre/post phases
entityType?: "feature-orchestration" | "task-orchestration" |
"status-progression" | "dependency-analysis" |
"feature-architect" | "planning-specialist" |
"backend-engineer" | "frontend-developer" |
"database-engineer" | "test-engineer" |
"technical-writer" | "bug-triage-specialist",
// For pre phase
userInput?: string, // Original user request
// For post phase
entityOutput?: string, // Output from Skill/Subagent
entityId?: string, // Feature/Task/Project ID (if applicable)
// Optional
verboseReporting?: boolean // Default: false (brief reports)
}
Workflow
Phase: configure (Interactive Configuration) - ALWAYS RUN FIRST
Purpose: Let user choose which analysis categories to enable for the session
When: Before init phase, or when user wants to change settings mid-session
Interactive Prompts:
Use AskUserQuestion to present configuration options:
AskUserQuestion({
questions: [
{
question: "Which quality analysis categories would you like to enable for this session?",
header: "QA Categories",
multiSelect: true,
options: [
{
label: "Information Density",
description: "Analyze task content quality, detect wasteful patterns, measure information-to-token ratio (Specialists only)"
},
{
label: "Execution Graphs",
description: "Validate dependency graphs and parallel execution opportunities (Planning Specialist only)"
},
{
label: "Tag Coverage",
description: "Check tag consistency and agent-mapping coverage (Planning Specialist & Feature Architect)"
},
{
label: "Token Optimization",
description: "Identify token waste patterns (verbose output, unnecessary loading, redundant operations)"
},
{
label: "Tool Selection",
description: "Verify optimal tool usage (overview vs get, search vs filtered query, bulk operations)"
},
{
label: "Routing Validation",
description: "Detect Skills bypass violations (CRITICAL - status changes, feature creation, task execution)"
},
{
label: "Parallel Detection",
description: "Find missed parallelization opportunities (independent tasks, batch operations)"
}
]
},
{
question: "How detailed should QA reports be?",
header: "Report Style",
multiSelect: false,
options: [
{
label: "Brief",
description: "Only show critical issues (ALERT level) - minimal token usage"
},
{
label: "Standard",
description: "Show ALERT and WARN level issues with brief explanations"
},
{
label: "Detailed",
description: "Show all issues (ALERT/WARN/INFO) with full analysis and recommendations"
}
]
}
]
})
Default Configuration (if user skips configuration):
- ✅ Routing Validation (CRITICAL - always enabled)
- ✅ Information Density (for specialists)
- ❌ All other categories disabled
- Report style: Standard
Configuration Storage: Store user preferences in session state:
session.qaConfig = {
enabled: {
informationDensity: true/false,
executionGraphs: true/false,
tagCoverage: true/false,
tokenOptimization: true/false,
toolSelection: true/false,
routingValidation: true, // Always true (CRITICAL)
parallelDetection: true/false
},
reportStyle: "brief" | "standard" | "detailed"
}
Token Cost: ~200-300 tokens (one-time configuration)
Phase: init (Session Initialization)
Purpose: Load knowledge bases for validation throughout session
Steps:
- If not configured: Run configure phase first (interactive)
- Read
initialization.mdfor setup workflow - Glob
.claude/skills/*/SKILL.md→ extract Skills knowledge- Parse skill name, triggers, workflows, tools, token ranges
- Glob
.claude/agents/task-orchestrator/*.md→ extract Subagents knowledge- Parse agent name, steps, critical patterns, output validation
- Read
agent-mapping.yaml→ extract routing configuration - Initialize tracking state (deviations, patterns, improvements)
- Report initialization status with active configuration
Output:
{
initialized: true,
knowledgeBase: {
skillsCount: 5,
subagentsCount: 8,
routingLoaded: true
},
configuration: {
enabled: ["Information Density", "Routing Validation"],
disabled: ["Execution Graphs", "Tag Coverage", "Token Optimization", "Tool Selection", "Parallel Detection"],
reportStyle: "standard"
},
tracking: {
sessionStart: timestamp,
deviations: [],
patterns: []
}
}
Token Cost: ~800-1000 tokens (loads once per session)
Phase: pre (Pre-Execution Validation)
Purpose: Capture context and set validation checkpoints before launching
Steps:
- Read
pre-execution.mdfor validation checklist - Identify entity type (Skill vs Subagent)
- Capture original user input context
- Set entity-specific validation checkpoints based on type:
- Skills: Expected workflow steps, tool usage, token range
- Subagents: Expected steps (8-9 steps), critical patterns, output format
- Store context for post-execution comparison
- Return ready signal
Context Captured:
- User's original request (full text)
- Expected mode (PRD/Interactive/Quick for Feature Architect)
- Entity type and anticipated complexity
- Validation checkpoints to verify after execution
Output:
{
ready: true,
contextCaptured: true,
checkpoints: [
"Verify Skill assessed complexity correctly",
"Verify templates discovered and applied",
// ... entity-specific checkpoints
]
}
Token Cost: ~400-600 tokens
Phase: post (Post-Execution Review)
Purpose: Validate workflow adherence, analyze quality, detect deviations
Steps:
1. Load Post-Execution Workflow
Read post-execution.md for review process
2. Determine Required Analyses
Based on entity type AND user configuration:
Planning Specialist:
- Always:
post-execution.md→ core workflow validation - If
routingValidationenabled:routing-validation.md→ Skills usage check - If
executionGraphsenabled:graph-quality.md→ execution graph validation - If
tagCoverageenabled:tag-quality.md→ tag coverage analysis
Feature Architect:
- Always:
post-execution.md→ PRD extraction validation - Always: Compare output vs original user input
- If
routingValidationenabled:routing-validation.md→ agent-mapping check - If
tagCoverageenabled:tag-quality.md→ tag consistency check
Implementation Specialists (Backend, Frontend, Database, Test, Technical Writer):
- Always:
post-execution.md→ lifecycle steps verification - If
routingValidationenabled:routing-validation.md→ Status Progression Skill usage - If
informationDensityenabled:task-content-quality.md→ content quality analysis - Always: Verify summary (300-500 chars), Files Changed section, test results
All Skills:
- Always: Read skill definition from knowledge base
- Always: Verify expected workflow steps followed
- Always: Check tool usage matches expected patterns
- Always: Validate token range
3. Conditional Efficiency Analysis
Based on user configuration:
- If
tokenOptimizationenabled: Readtoken-optimization.md→ identify token waste - If
toolSelectionenabled: Readtool-selection.md→ verify optimal tool usage - If
parallelDetectionenabled: Readparallel-detection.md→ find missed parallelization
4. Deviation Detection
Compare actual execution against expected patterns:
- ALERT: Critical violations (status bypass, cross-domain tasks, missing requirements)
- WARN: Process issues (verbose output, skipped steps, suboptimal dependencies)
- INFO: Observations (efficiency opportunities, quality patterns)
5. Reporting
If deviations found:
- Read
deviation-templates.md→ format report - Add to TodoWrite with appropriate severity
- If ALERT: Report immediately to user with decision prompt
- If WARN: Log for end-of-session summary
- If INFO: Track for pattern analysis
6. Pattern Tracking
Read pattern-tracking.md → continuous improvement:
- Check for recurring issues (count >= 2 in session)
- Suggest definition improvements if patterns detected
- Track for session summary
Output:
{
workflowAdherence: "8/8 steps followed (100%)",
expectedOutputs: "7/7 present",
deviations: [
{
severity: "ALERT",
issue: "Cross-domain task detected",
details: "Task mixes backend + frontend",
recommendation: "Split into domain-isolated tasks"
}
],
analyses: {
graphQuality: "95%",
tagCoverage: "100%",
tokenEfficiency: "85%"
},
recommendations: [
"Update planning-specialist.md to enforce domain isolation",
"Add validation checklist for cross-domain detection"
]
}
Token Cost:
- Basic validation: ~600-800 tokens
- With specialized analysis (Planning Specialist): ~1500-2000 tokens
- With efficiency analysis: +800-1200 tokens
Progressive Loading Strategy
Optimization: Load only the analysis docs needed based on entity type AND user configuration
Configuration-Driven Loading
Core Loading (always loaded regardless of config):
post-execution.md→ base workflow validation- Skill/Subagent definition from knowledge base
- Entity-specific mandatory checks (summary, files changed, etc.)
Conditional Loading (based on user configuration):
// Planning Specialist
if (config.routingValidation) → Read routing-validation.md
if (config.executionGraphs) → Read graph-quality.md
if (config.tagCoverage) → Read tag-quality.md
// Feature Architect
if (config.routingValidation) → Read routing-validation.md
if (config.tagCoverage) → Read tag-quality.md
// Implementation Specialists (Backend, Frontend, Database, Test, Technical Writer)
if (config.routingValidation) → Read routing-validation.md
if (config.informationDensity) → Read task-content-quality.md
// All Entities
if (config.tokenOptimization) → Read token-optimization.md
if (config.toolSelection) → Read tool-selection.md
if (config.parallelDetection) → Read parallel-detection.md
// Reporting
if (deviations.length > 0) → Read deviation-templates.md
if (session.deviations.count >= 2) → Read pattern-tracking.md
Token Savings Examples
Example 1: User only wants Information Density feedback
- Configuration: Only "Information Density" enabled
- Loaded for Backend Engineer:
post-execution.md+task-content-quality.md= ~1,200 tokens - Skipped:
routing-validation.md,token-optimization.md,tool-selection.md,parallel-detection.md= ~2,400 tokens saved - Savings: 67% reduction
Example 2: User wants minimal CRITICAL validation only
- Configuration: Only "Routing Validation" enabled
- Loaded:
post-execution.md+routing-validation.md= ~1,000 tokens - Skipped: All other analysis docs = ~3,500 tokens saved
- Savings: 78% reduction
Example 3: User wants comprehensive Planning Specialist review
- Configuration: All categories enabled
- Loaded:
post-execution.md+graph-quality.md+tag-quality.md+routing-validation.md+ efficiency docs = ~3,500 tokens - Skipped: None (comprehensive mode)
- Savings: 0% (full analysis)
Special Cases
Task Orchestration Skill:
parallel-detection.mdalways loaded if enabled in config (core to this skill's purpose)
Status Progression Skill:
routing-validation.mdalways loaded if enabled in config (CRITICAL - status bypass detection)
Output Format
Success (No Deviations)
✅ **QA Review**: [Entity Name]
Workflow adherence: 100%
All quality checks passed.
[If efficiency analysis enabled:]
Token efficiency: 85% (identified 2 optimization opportunities)
Issues Found
## QA Review: [Entity Name]
**Workflow Adherence:** X/Y steps (Z%)
### ✅ Successes
- [Success 1]
- [Success 2]
### ⚠️ Issues Detected
**🚨 ALERT**: [Critical issue]
- Impact: [What this affects]
- Found: [What was observed]
- Expected: [What should have happened]
- Recommendation: [How to fix]
**⚠️ WARN**: [Process issue]
- Found: [What was observed]
- Expected: [What should have happened]
### 📋 Added to TodoWrite
- Review [Entity]: [Issue description]
- Improvement: [Suggestion]
### 🎯 Recommendations
1. [Most critical action]
2. [Secondary action]
### 💭 Decision Required
[If user decision needed, present options]
Integration with Orchestrator
Recommended Pattern:
// 1. FIRST TIME: Interactive configuration
Use orchestration-qa skill (phase="configure")
// Agent asks user which analysis categories to enable
// User selects: "Information Density" + "Routing Validation"
// Configuration stored in session
// 2. Session initialization
Use orchestration-qa skill (phase="init")
// Returns: Initialized with [2] analysis categories enabled
// 3. Before launching Feature Architect
Use orchestration-qa skill (
phase="pre",
entityType="feature-architect",
userInput="[user's original request]"
)
// 4. Launch Feature Architect
Task(subagent_type="Feature Architect", prompt="...")
// 5. After Feature Architect returns
Use orchestration-qa skill (
phase="post",
entityType="feature-architect",
entityOutput="[subagent's response]",
entityId="feature-uuid"
)
// Only loads: post-execution.md + routing-validation.md (user config)
// Skips: graph-quality.md, tag-quality.md, token-optimization.md (not enabled)
// 6. Review QA findings, take action if needed
Mid-Session Reconfiguration:
// User: "I want to also track token optimization now"
Use orchestration-qa skill (phase="configure")
// Agent asks again, pre-selects current config
// User adds "Token Optimization" to enabled categories
// New config stored, affects all subsequent post-execution reviews
Supporting Documentation
This skill uses progressive loading to minimize token usage. Supporting docs are read as needed:
- initialization.md - Session setup workflow
- pre-execution.md - Context capture and checkpoint setting
- post-execution.md - Core review workflow for all entities
- graph-quality.md - Planning Specialist: execution graph analysis
- tag-quality.md - Planning Specialist: tag coverage validation
- task-content-quality.md - Implementation Specialists: information density and wasteful pattern detection
- token-optimization.md - Efficiency: identify token waste patterns
- tool-selection.md - Efficiency: verify optimal tool usage
- parallel-detection.md - Efficiency: find missed parallelization
- routing-validation.md - Critical: Skills vs Direct tool violations
- deviation-templates.md - User report formatting by severity
- pattern-tracking.md - Continuous improvement tracking
Token Efficiency
Current Trainer (monolithic): ~20k-30k tokens always loaded
Orchestration QA Skill (configuration-driven progressive loading):
- Configure phase: ~200-300 tokens (one-time, interactive)
- Init phase: ~1000 tokens (one-time per session)
- Pre-execution: ~600 tokens (per entity)
- Post-execution (varies by configuration):
- Minimal (routing only): ~800-1000 tokens
- Standard (info density + routing): ~1200-1500 tokens
- Planning Specialist (graphs + tags + routing): ~2000-2500 tokens
- Comprehensive (all categories): ~3500-4000 tokens
Configuration Impact Examples:
| User Configuration | Token Cost | vs Monolithic | vs Default |
|---|---|---|---|
| Information Density only | ~1,200 tokens | 94% savings | 67% savings |
| Routing Validation only | ~1,000 tokens | 95% savings | 78% savings |
| Default (Info + Routing) | ~1,500 tokens | 93% savings | baseline |
| Comprehensive (all enabled) | ~4,000 tokens | 80% savings | -167% |
Smart Defaults: Most users only need Information Density + Routing Validation, achieving 93% token reduction while catching critical issues and wasteful content.
Quality Metrics
Track these metrics across sessions:
- Workflow adherence percentage
- Deviation count by severity (ALERT/WARN/INFO)
- Pattern recurrence (same issue multiple times)
- Definition improvement suggestions generated
- Token efficiency of analyzed workflows
Examples
See examples.md for detailed usage scenarios including:
- Interactive configuration - Choosing analysis categories
- Session initialization - Loading knowledge bases with config
- Feature Architect validation - PRD mode with selective analysis
- Planning Specialist review - Graph + tag analysis (when enabled)
- Implementation Specialist review - Information density tracking
- Status Progression enforcement - Critical routing violations
- Mid-session reconfiguration - Changing enabled categories
- Token efficiency comparisons - Different configuration impacts