name	quality-gates
description	Use when assessing task complexity, before starting complex tasks, or when stuck after multiple attempts. Provides complexity scoring (1-5), blocking thresholds, and escalation workflows to prevent wasted work.
version	1.1.0
author	SkillForge AI Agent Hub
tags	quality, complexity, planning, escalation, blocking

Quality Gates Skill

Version: 1.0.0 Type: Quality Assurance & Risk Management Auto-activate: Task planning, complexity assessment, requirement gathering, before task execution

Overview

This skill teaches agents how to assess task complexity, enforce quality gates, and prevent wasted work on incomplete or poorly-defined tasks. Inspired by production-grade development practices, quality gates ensure agents have sufficient context before proceeding and automatically escalate when stuck or blocked.

Key Principle: Stop and clarify before proceeding with incomplete information. Better to ask questions than to waste cycles on the wrong solution.

When to Use This Skill

Auto-Activate Triggers

Receiving a new task assignment
Starting a complex feature implementation
Before allocating work in Squad mode
When requirements seem unclear or incomplete
After 3 failed attempts at the same task
When blocked by dependencies

Manual Activation

User asks for complexity assessment
Planning a multi-step project
Before committing to a timeline
When uncertain about requirements

Core Concepts

1. Complexity Scoring (1-5 Scale)

Assess every task on a 1-5 complexity scale:

Level 1: Trivial

Single file change
Simple variable rename
Documentation update
CSS styling tweak
< 50 lines of code
< 30 minutes estimated
No dependencies
No unknowns

Level 2: Simple

1-3 file changes
Basic function implementation
Simple API endpoint (CRUD)
Straightforward component
50-200 lines of code
30 minutes - 2 hours estimated
0-1 dependencies
Minimal unknowns

Level 3: Moderate

3-10 file changes
Multiple component coordination
API with validation and error handling
State management integration
Database schema changes
200-500 lines of code
2-8 hours estimated
2-3 dependencies
Some unknowns that need research

Level 4: Complex

10-25 file changes
Cross-cutting concerns
Authentication/authorization
Real-time features (WebSockets)
Payment integration
Database migrations with data
500-1500 lines of code
8-24 hours (1-3 days) estimated
4-6 dependencies
Significant unknowns
Multiple decision points

Level 5: Very Complex

25+ file changes
Architectural changes
New service/microservice
Complete feature subsystem
Third-party API integration
Performance optimization
1500+ lines of code
24+ hours (3+ days) estimated
7+ dependencies
Many unknowns
Requires research and prototyping
High risk of scope creep

2. Quality Gate Thresholds

BLOCKING Conditions (MUST resolve before proceeding):

Incomplete Requirements (>3 critical questions)
- If you have more than 3 unanswered critical questions, STOP
- Examples of critical questions:
  - "What should happen when X fails?"
  - "What data structure should I use?"
  - "What's the expected behavior for edge case Y?"
  - "Which API should I call?"
  - "What authentication method?"
Missing Dependencies (blocked by another task)
- Task depends on incomplete work
- Required API endpoint doesn't exist
- Database schema not ready
- External service not configured
Stuck Detection (3 attempts at same task)
- Tried 3 different approaches, all failed
- Keep encountering the same error
- Can't find necessary information
- Solution keeps breaking other things
Evidence Failure (tests/builds failing)
- Tests fail after 2 fix attempts
- Build breaks after changes
- Type errors persist
- Integration tests failing
Complexity Overflow (Level 4-5 tasks without breakdown)
- Complex task not broken into subtasks
- No clear implementation plan
- Too many unknowns
- Scope unclear

WARNING Conditions (Can proceed with caution):

Moderate Complexity (Level 3)
- Can proceed but should verify approach first
- Document assumptions
- Plan for checkpoints
1-2 Unanswered Questions
- Document assumptions
- Proceed with best guess
- Note for review later
1-2 Failed Attempts
- Try alternative approach
- Document what didn't work
- Consider asking for help

3. Gate Validation Process

## Quality Gate Check

**Task:** [Task description]
**Complexity:** [1-5 scale]
**Dependencies:** [List dependencies]

### Critical Questions (Must answer before proceeding)
1. [Question 1] - ✅ Answered / ❌ Unknown
2. [Question 2] - ✅ Answered / ❌ Unknown
3. [Question 3] - ✅ Answered / ❌ Unknown

**Unanswered Critical Questions:** [Count]

### Dependency Check
- [ ] All required APIs exist
- [ ] Database schema ready
- [ ] Required services running
- [ ] External APIs accessible
- [ ] Authentication configured

**Blocked Dependencies:** [List]

### Attempt History
- Attempt 1: [What was tried, outcome]
- Attempt 2: [What was tried, outcome]
- Attempt 3: [What was tried, outcome]

**Failed Attempts:** [Count]

### Gate Status
- ✅ **PASS** - Can proceed
- ⚠️ **WARNING** - Proceed with caution
- ❌ **BLOCKED** - Must resolve before proceeding

### Blocking Reasons (if blocked)
- [ ] >3 critical questions unanswered
- [ ] Missing dependencies
- [ ] 3+ failed attempts (stuck)
- [ ] Evidence shows failures
- [ ] Complexity too high without plan

### Actions Required
[List actions needed to unblock]

Quality Gate Workflows

Workflow 1: Pre-Task Gate Validation

When: Before starting any task (especially Level 3-5)

Steps:

Assess Complexity

Read task description
Count file changes needed
Estimate lines of code
Identify dependencies
Count unknowns
→ Assign complexity score (1-5)

Identify Critical Questions

What must I know to complete this?
- Data structures?
- Expected behaviors?
- Edge cases?
- Error handling?
- API contracts?

→ List all critical questions
→ Count unanswered questions

Check Dependencies

What does this task depend on?
- Other tasks?
- External services?
- Database changes?
- Configuration?

→ Verify dependencies ready
→ List blockers

Gate Decision

if (unansweredQuestions > 3) → BLOCK
if (missingDependencies > 0) → BLOCK
if (complexity >= 4 && !hasPlan) → BLOCK
if (complexity == 3) → WARN
else → PASS

Document in Context

context.tasks_pending.push({
  id: 'task-' + Date.now(),
  task: "Task description",
  complexity_score: 3,
  gate_status: 'pass',
  critical_questions: [...],
  dependencies: [...],
  timestamp: new Date().toISOString()
});

Workflow 2: Stuck Detection & Escalation

When: After multiple failed attempts at same task

Steps:

Track Attempts

// In context, track attempts
if (!context.attempt_tracking) {
  context.attempt_tracking = {};
}

if (!context.attempt_tracking[taskId]) {
  context.attempt_tracking[taskId] = {
    attempts: [],
    first_attempt: new Date().toISOString()
  };
}

context.attempt_tracking[taskId].attempts.push({
  timestamp: new Date().toISOString(),
  approach: "Describe what was tried",
  outcome: "Failed because X",
  error_message: "Error details"
});

Check Threshold

const attemptCount = context.attempt_tracking[taskId].attempts.length;

if (attemptCount >= 3) {
  // ESCALATE - stuck
  return {
    status: 'blocked',
    reason: 'stuck_after_3_attempts',
    escalate_to: 'user',
    attempts_history: context.attempt_tracking[taskId].attempts
  };
}

Escalation Message

## 🚨 Escalation: Task Stuck

**Task:** [Task description]
**Attempts:** 3
**Status:** BLOCKED - Need human guidance

### What Was Tried
1. **Attempt 1:** [Approach] → Failed: [Reason]
2. **Attempt 2:** [Approach] → Failed: [Reason]
3. **Attempt 3:** [Approach] → Failed: [Reason]

### Current Blocker
[Describe the persistent problem]

### Need Guidance On
- [Specific question 1]
- [Specific question 2]

**Recommendation:** Human review needed to unblock

Workflow 3: Complexity Breakdown (Level 4-5)

When: Assigned a Level 4 or 5 complexity task

Steps:

Break Down into Subtasks

## Task Breakdown: [Main Task]
**Overall Complexity:** Level 4

### Subtasks
1. **Subtask 1:** [Description]
   - Complexity: Level 2
   - Dependencies: None
   - Estimated: 2 hours

2. **Subtask 2:** [Description]
   - Complexity: Level 3
   - Dependencies: Subtask 1
   - Estimated: 4 hours

3. **Subtask 3:** [Description]
   - Complexity: Level 2
   - Dependencies: Subtask 2
   - Estimated: 2 hours

**Total Estimated:** 8 hours
**Complexity Check:** All subtasks ≤ Level 3 ✅

Validate Breakdown

Check:
- [ ] All subtasks are Level 1-3
- [ ] Dependencies clearly mapped
- [ ] Each subtask has clear acceptance criteria
- [ ] Sum of estimates reasonable
- [ ] No overlapping work

Create Execution Plan

## Execution Plan

**Phase 1:** Subtask 1
- Start: After requirements confirmed
- Gate check: Pass
- Evidence: Tests pass, build succeeds

**Phase 2:** Subtask 2
- Start: After Subtask 1 complete
- Gate check: Verify Subtask 1 evidence
- Evidence: Integration tests pass

**Phase 3:** Subtask 3
- Start: After Subtask 2 complete
- Gate check: End-to-end verification
- Evidence: Full feature tests pass

Workflow 4: Requirements Completeness Check

When: Starting a new feature or significant task

Steps:

Functional Requirements Check

## Functional Requirements

- [ ] **Happy path defined:** What should happen when everything works?
- [ ] **Error cases defined:** What should happen when things fail?
- [ ] **Edge cases identified:** What are the boundary conditions?
- [ ] **Input validation:** What inputs are valid/invalid?
- [ ] **Output format:** What should the output look like?
- [ ] **Success criteria:** How do we know it works?

Technical Requirements Check

## Technical Requirements

- [ ] **API contracts:** Endpoints, methods, schemas defined?
- [ ] **Data structures:** Models, types, interfaces specified?
- [ ] **Database changes:** Schema migrations needed?
- [ ] **Authentication:** Who can access this?
- [ ] **Performance:** Any latency/throughput requirements?
- [ ] **Security:** Any special security considerations?

Count Critical Unknowns

const criticalUnknowns = [
  !functionalRequirements.happyPath,
  !functionalRequirements.errorCases,
  !technicalRequirements.apiContracts,
  !technicalRequirements.dataStructures
].filter(unknown => unknown).length;

if (criticalUnknowns > 3) {
  return {
    gate_status: 'blocked',
    reason: 'incomplete_requirements',
    critical_unknowns: criticalUnknowns,
    action: 'clarify_requirements'
  };
}

Quality Gate Templates

Template 1: Pre-Task Gate Check

# Quality Gate: [Task Name]

**Date:** [YYYY-MM-DD]
**Agent:** [Agent name]

## Complexity Assessment

**Estimated Lines of Code:** [X]
**Estimated Duration:** [X hours]
**File Changes:** [X files]
**Dependencies:** [X dependencies]
**Unknowns:** [X unknowns]

**Complexity Score:** Level [1-5]

## Critical Questions

1. [Question 1] - ✅ Answered / ❌ Unknown
2. [Question 2] - ✅ Answered / ❌ Unknown
3. [Question 3] - ✅ Answered / ❌ Unknown

**Unanswered:** [Count]

## Dependency Check

**Required:**
- [ ] [Dependency 1] - Ready / Blocked
- [ ] [Dependency 2] - Ready / Blocked

**Blockers:** [List]

## Gate Decision

**Status:** ✅ PASS / ⚠️ WARNING / ❌ BLOCKED

**Reasoning:** [Why this decision]

**Actions Required:** [If blocked or warning]

**Can Proceed:** Yes / No

Template 2: Stuck Escalation

# Escalation: Task Stuck

**Task:** [Task description]
**Agent:** [Agent name]
**Date:** [YYYY-MM-DD]

## Attempt History

**Attempt 1** ([Timestamp])
- **Approach:** [What was tried]
- **Outcome:** Failed
- **Error:** [Error message or issue]

**Attempt 2** ([Timestamp])
- **Approach:** [What was tried]
- **Outcome:** Failed
- **Error:** [Error message or issue]

**Attempt 3** ([Timestamp])
- **Approach:** [What was tried]
- **Outcome:** Failed
- **Error:** [Error message or issue]

## Current Blocker

[Detailed description of persistent problem]

## Need Guidance

1. [Specific question requiring human input]
2. [Specific question requiring human input]

## Recommendation

**Escalate to:** User / Studio Coach / Specific Agent

**Suggested Actions:** [What might unblock this]

Template 3: Complexity Breakdown

# Task Breakdown: [Main Task]

**Original Complexity:** Level [4-5]
**Goal:** Break down to Level 1-3 subtasks

## Subtasks

### Subtask 1: [Name]
- **Complexity:** Level [X]
- **Estimated Duration:** [X hours]
- **Dependencies:** [None / List]
- **Acceptance Criteria:**
  - [ ] [Criterion 1]
  - [ ] [Criterion 2]

### Subtask 2: [Name]
- **Complexity:** Level [X]
- **Estimated Duration:** [X hours]
- **Dependencies:** [List]
- **Acceptance Criteria:**
  - [ ] [Criterion 1]
  - [ ] [Criterion 2]

### Subtask 3: [Name]
- **Complexity:** Level [X]
- **Estimated Duration:** [X hours]
- **Dependencies:** [List]
- **Acceptance Criteria:**
  - [ ] [Criterion 1]
  - [ ] [Criterion 2]

## Validation

- [ ] All subtasks ≤ Level 3
- [ ] Dependencies clearly mapped
- [ ] No circular dependencies
- [ ] Acceptance criteria clear
- [ ] Total estimate reasonable

**Can Proceed:** Yes / No

Integration with Context System

Quality gates integrate with the context system for tracking:

// Add gate check to context
context.quality_gates = context.quality_gates || [];
context.quality_gates.push({
  task_id: taskId,
  timestamp: new Date().toISOString(),
  complexity_score: 3,
  gate_status: 'pass', // pass, warning, blocked
  critical_questions_count: 1,
  unanswered_questions: 1,
  dependencies_blocked: 0,
  attempt_count: 0,
  can_proceed: true
});

Integration with Evidence System

Quality gates check for evidence before allowing completion:

// Before marking task complete
const evidence = context.quality_evidence;
const hasPassingEvidence = (
  evidence?.tests?.exit_code === 0 ||
  evidence?.build?.exit_code === 0
);

if (!hasPassingEvidence) {
  return {
    gate_status: 'blocked',
    reason: 'no_passing_evidence',
    action: 'collect_evidence_first'
  };
}

Best Practices

1. Always Run Gate Check Before Starting

// ❌ BAD: Start immediately
function startTask(task) {
  implementTask(task);
}

// ✅ GOOD: Gate check first
function startTask(task) {
  const gateCheck = runQualityGate(task);

  if (gateCheck.status === 'blocked') {
    escalate(gateCheck.reason);
    return;
  }

  if (gateCheck.status === 'warning') {
    documentAssumptions(gateCheck.warnings);
  }

  implementTask(task);
}

2. Document All Assumptions

When proceeding with warnings, document assumptions:

## Assumptions Made
1. **Assumption:** API will return JSON format
   **Risk:** Low - standard REST practice
   **Mitigation:** Add try-catch for parsing

2. **Assumption:** User authentication already implemented
   **Risk:** Medium - might not exist
   **Mitigation:** Check early, escalate if missing

3. Track Attempts for Stuck Detection

// Track every attempt
function attemptTask(taskId, approach) {
  trackAttempt(taskId, approach);

  const attemptCount = getAttemptCount(taskId);
  if (attemptCount >= 3) {
    escalateToUser(taskId);
    return 'blocked';
  }

  return executeApproach(approach);
}

4. Break Down Complex Tasks Proactively

// ❌ BAD: Tackle Level 5 task directly
implementComplexFeature();

// ✅ GOOD: Break down first
function handleComplexTask(task) {
  if (task.complexity >= 4) {
    const subtasks = breakDownIntoSubtasks(task);

    subtasks.forEach(subtask => {
      runQualityGate(subtask);
      implementSubtask(subtask);
    });
  } else {
    implementTask(task);
  }
}

Common Pitfalls

❌ Pitfall 1: Skipping Gate Checks for "Simple" Tasks

**Problem:** Assume task is simple, skip gate check, get stuck later
**Solution:** Always run gate check, even for Level 1-2 tasks (quick check)

❌ Pitfall 2: Ignoring Warning Status

**Problem:** Proceed with warnings without documenting assumptions
**Solution:** Document every assumption when proceeding with warnings

❌ Pitfall 3: Not Tracking Attempts

**Problem:** Keep trying same approach repeatedly, waste cycles
**Solution:** Track every attempt, escalate after 3

❌ Pitfall 4: Proceeding When Blocked

**Problem:** Gate says BLOCKED but proceed anyway "to make progress"
**Solution:** NEVER bypass BLOCKED gates - resolve blockers first

Quick Reference

Complexity Quick Check

1-3 files, < 200 lines, < 2 hours → Level 1-2
3-10 files, 200-500 lines, 2-8 hours → Level 3
10-25 files, 500-1500 lines, 8-24 hours → Level 4
25+ files, 1500+ lines, 24+ hours → Level 5

Blocking Threshold Quick Check

3 critical questions unanswered → BLOCK
Missing dependencies → BLOCK
3+ failed attempts → BLOCK & ESCALATE
Level 4-5 without breakdown → BLOCK

Gate Decision Quick Flow

1. Assess complexity (1-5)
2. Count critical questions unanswered
3. Check dependencies blocked
4. Check attempt count

if (questions > 3 || dependencies blocked || attempts >= 3) → BLOCK
else if (complexity >= 4 && no plan) → BLOCK
else if (complexity == 3 || questions 1-2) → WARNING
else → PASS

LLM-as-Judge Quality Validation (v1.1.0)

Modern AI workflows benefit from automated quality assessment using LLM-as-judge patterns.

Quality Aspects to Evaluate

When validating LLM-generated content, evaluate these dimensions:

QUALITY_ASPECTS = [
    "relevance",    # How relevant is the output to the input?
    "depth",        # How thorough and detailed is the analysis?
    "coherence",    # How well-structured and clear is the output?
    "accuracy",     # Are facts and code snippets correct?
    "completeness"  # Are all required sections present?
]

Quality Gate Implementation Pattern

async def quality_gate_node(state: WorkflowState) -> dict:
    """Validate output quality using LLM-as-judge."""
    THRESHOLD = 0.7  # Minimum score to pass (0.0-1.0)
    MAX_RETRIES = 2

    # Skip if no content to validate
    if not state.get("output"):
        return {"quality_gate_passed": True}

    # Evaluate each quality aspect
    scores = {}
    for aspect in QUALITY_ASPECTS:
        try:
            async with asyncio.timeout(30):  # Timeout protection
                score = await evaluate_aspect(
                    input_content=state["input"],
                    output_content=state["output"],
                    aspect=aspect
                )
                scores[aspect] = score
        except TimeoutError:
            scores[aspect] = 0.7  # Fail open with passing score

    # Calculate average (guard against division by zero)
    avg_score = sum(scores.values()) / len(scores) if scores else 0.0

    # Determine gate result
    retry_count = state.get("retry_count", 0)
    gate_passed = avg_score >= THRESHOLD or retry_count >= MAX_RETRIES

    return {
        "quality_scores": scores,
        "quality_gate_avg_score": avg_score,
        "quality_gate_passed": gate_passed,
        "quality_gate_retry_count": retry_count
    }

Retry Logic

def should_retry_synthesis(state: WorkflowState) -> str:
    """Conditional edge function for quality gate routing."""
    if state.get("quality_gate_passed", True):
        return "continue"  # Proceed to next node

    retry_count = state.get("quality_gate_retry_count", 0)
    if retry_count < MAX_RETRIES:
        return "retry_synthesis"  # Re-run synthesis

    return "continue"  # Max retries reached, fail open

Fail-Open vs Fail-Closed

Fail-Open (Recommended for most cases):

If quality validation fails/errors, allow workflow to continue
Log the failure for monitoring
Prevents workflow from getting stuck
Use when partial output is better than no output

Fail-Closed (Use for critical paths):

If validation fails, block the workflow
Use for payment processing, security operations
Requires explicit error handling and user notification

Graceful Degradation Pattern

async def safe_quality_evaluation(state: dict) -> dict:
    """Quality gate with full graceful degradation."""
    try:
        async with asyncio.timeout(60):  # Total timeout
            return await quality_gate_node(state)
    except TimeoutError:
        logger.warning("quality_gate_timeout", analysis_id=state["id"])
        return {
            "quality_gate_passed": True,  # Fail open
            "quality_gate_error": "Evaluation timed out"
        }
    except Exception as e:
        logger.error("quality_gate_error", error=str(e))
        return {
            "quality_gate_passed": True,  # Fail open
            "quality_gate_error": str(e)
        }

Triple-Consumer Artifact Design (v1.1.0)

Modern artifacts should serve three distinct audiences from the same content:

1. AI Coding Assistants (Claude Code, Cursor, Copilot)

Need: Structured context, implementation steps, code snippets
Format: Pre-formatted prompts enabling accurate code generation
Quality check: Are code snippets runnable? Are steps actionable?

2. Tutor Systems (Socratic learning)

Need: Core concepts, exercises, quiz questions, mastery checklists
Format: Pedagogical structure for progressive skill building
Quality check: Do exercises have hints and solutions? Are quiz answers valid?

3. Human Readers (Developers, learners)

Need: TL;DR, visual diagrams, glossary, clear explanations
Format: Scannable in 10-30 seconds with deep-dive capability
Quality check: Is summary under 500 chars? Do diagrams render correctly?

Schema Validation for Multi-Consumer Output

from pydantic import BaseModel, Field, model_validator

class QuizQuestion(BaseModel):
    """Quiz question with validated answer."""
    question: str = Field(min_length=10)
    options: list[str] = Field(min_length=2, max_length=6)
    correct_answer: str
    explanation: str = Field(min_length=20)

    @model_validator(mode='after')
    def validate_correct_answer(self) -> 'QuizQuestion':
        """Ensure correct_answer is one of the options."""
        if self.correct_answer not in self.options:
            raise ValueError(
                f"correct_answer '{self.correct_answer}' "
                f"must be one of {self.options}"
            )
        return self

Version History

v1.1.0 - Artifact Quality Initiative Update

Added LLM-as-judge quality validation patterns
Added retry logic with fail-open behavior
Added graceful degradation patterns
Added triple-consumer artifact design guidance
Added Pydantic v2 validation examples

v1.0.0 - Initial release

Complexity scoring (1-5 scale)
Blocking thresholds
Stuck detection and escalation
Requirements completeness checks
Context integration
Templates and workflows

Remember: Quality gates prevent wasted work. Better to ask questions upfront than to build the wrong solution. When in doubt, BLOCK and escalate.

Install Skill

SKILL.md

Quality Gates Skill

Overview

When to Use This Skill

Auto-Activate Triggers

Manual Activation

Core Concepts

1. Complexity Scoring (1-5 Scale)

2. Quality Gate Thresholds

3. Gate Validation Process

Quality Gate Workflows

Workflow 1: Pre-Task Gate Validation

Workflow 2: Stuck Detection & Escalation

Workflow 3: Complexity Breakdown (Level 4-5)

Workflow 4: Requirements Completeness Check

Quality Gate Templates

Template 1: Pre-Task Gate Check

Template 2: Stuck Escalation

Template 3: Complexity Breakdown

Integration with Context System

Integration with Evidence System

Best Practices

1. Always Run Gate Check Before Starting

2. Document All Assumptions

3. Track Attempts for Stuck Detection

4. Break Down Complex Tasks Proactively

Common Pitfalls

❌ Pitfall 1: Skipping Gate Checks for "Simple" Tasks

❌ Pitfall 2: Ignoring Warning Status

❌ Pitfall 3: Not Tracking Attempts

❌ Pitfall 4: Proceeding When Blocked

Quick Reference

Complexity Quick Check

Blocking Threshold Quick Check

Gate Decision Quick Flow

LLM-as-Judge Quality Validation (v1.1.0)

Quality Aspects to Evaluate

Quality Gate Implementation Pattern

Retry Logic

Fail-Open vs Fail-Closed

Graceful Degradation Pattern

Triple-Consumer Artifact Design (v1.1.0)

1. AI Coding Assistants (Claude Code, Cursor, Copilot)

2. Tutor Systems (Socratic learning)

3. Human Readers (Developers, learners)

Schema Validation for Multi-Consumer Output

Version History