| name | framework-quality-gates |
| description | Implements TDD mandate, 3-strike failure protocol, and programmatic verification gates (linting, testing, coverage, quality gates, security scanning) for autonomous code quality and reliability. Use when ensuring code meets quality standards, running verification workflows, implementing test-driven development, or handling verification failures systematically. Essential for software-developer and refactor-assistant agents. |
Framework Quality Gates
Autonomous quality assurance framework implementing Test-Driven Development (TDD), 3-strike failure protocol, and five programmatic verification gates.
Purpose
This skill provides structured workflows and tools for ensuring code quality and reliability through automated verification. It enforces TDD practices, handles failures systematically using a 3-strike protocol, and runs five comprehensive quality gates: linting, testing, coverage, quality metrics, and security scanning.
When to Use This Skill
Use this skill when:
- Implementing new features or bug fixes (TDD workflow required)
- Running quality verification on code changes
- Ensuring code meets defined quality standards before completion
- Handling verification failures systematically
- Enforcing test coverage thresholds
- Scanning for security vulnerabilities
- Maintaining code quality metrics
- Working on autonomous software development tasks
Core Components
1. Five Verification Gates
Run verification gates in this order:
Gate 1: LINTING
Purpose: Enforce code style and catch syntax errors
# Example: Python with ruff
ruff check src/
# Example: JavaScript with ESLint
eslint src/
# Example: Go
golint ./...
Checks: Code style, unused imports, syntax errors, formatting
Gate 2: TESTING
Purpose: Verify functionality through automated tests
# Example: Python with pytest
pytest tests/ --cov=src
# Example: JavaScript with Jest
npm test
# Example: Go
go test ./...
Checks: Test execution, assertion validation, no unhandled exceptions
Gate 3: COVERAGE
Purpose: Ensure adequate test coverage
# Example: Python with coverage threshold
pytest --cov=src --cov-fail-under=80
# Example: JavaScript with coverage
jest --coverage --coverageThreshold='{"global":{"lines":80}}'
Checks: Line coverage, branch coverage, meets threshold (typically 80%)
Gate 4: QUALITY_GATE
Purpose: Measure code complexity and maintainability
# Example: Python with radon
radon cc src/ --min B # Cyclomatic complexity
radon mi src/ --min B # Maintainability index
# Example: Python with pylint score
pylint src/ --fail-under=7.0
Checks: Cyclomatic complexity, maintainability index, code duplication
Gate 5: SECURITY
Purpose: Identify security vulnerabilities
# Example: Python with bandit
bandit -r src/ -ll # High severity only
# Example: Python dependencies
safety check
# Example: JavaScript
npm audit --audit-level=high
Checks: OWASP Top 10, hardcoded secrets, vulnerable dependencies
For detailed information on each gate, see verification-gates.md.
2. TDD Mandate
Test-Driven Development is mandatory for all implementation tasks.
TDD Workflow (Red-Green-Refactor)
- RED: Write failing test that defines desired functionality
- GREEN: Write minimal code to make test pass
- REFACTOR: Clean up code while maintaining passing tests
Implementation Order
1. Create test plan task (REQUIRED FIRST STEP)
↓
2. Write comprehensive test cases
↓
3. Verify tests fail appropriately (RED phase)
↓
4. Implement functionality
↓
5. Run tests until they pass (GREEN phase)
↓
6. Refactor if needed (maintain passing tests)
↓
7. Run full verification suite (all 5 gates)
CRITICAL: A test plan task must be created and completed BEFORE any implementation task. This ensures tests are designed before code is written.
For comprehensive TDD guidance, see tdd-mandate.md.
Integration with Agentic Patterns
Quality gates implement the Verify Work phase of the agentic design framework. Before running programmatic checks, invoke Skill(agentic-patterns) to ensure alignment with the complete workflow:
Pattern:
1. Gather Context: Understand what needs verification
2. Take Action: Run programmatic quality gates
3. Verify Work: Confirm gates passed and provide feedback
This integration ensures verification follows structured agentic principles. See Skill(agentic-patterns) for the complete Gather Context → Take Action → Verify Work framework.
3. 3-Strike Failure Protocol
When verification gates fail, follow this structured recovery protocol:
Strike 1: RETRY (Same Approach)
Mindset: "Did I make a simple mistake?"
- Review error output carefully
- Check for typos, missing imports, syntax errors
- Apply obvious fixes
- Re-run same verification gate
Example: Linter fails due to missing import → Add import → Retry
Strike 2: REMEDY (Different Approach)
Mindset: "What's the fundamental issue? How can I solve this differently?"
- Analyze root cause deeply
- Design alternative solution strategy
- Implement different approach
- Re-run verification gate
Example: Tests still fail → Rewrite test logic with different assertions → Retry
Strike 3: HALT (Stop and Report)
Mindset: "I've exhausted autonomous options. Human expertise needed."
- Create failure marker file (
.quality-gate-failure) - Stop all new tasks immediately
- Document thoroughly:
- What task was attempted
- What gate failed
- What was tried in Strike 1 and Strike 2
- Error messages and analysis
- Report to user and request human intervention
CRITICAL: Once a failure marker exists, ALL new tasks are blocked until human resolves the issue and explicitly removes the marker file.
For comprehensive protocol details, see three-strike-protocol.md.
4. Feedback Loop Pattern
The continuous verification workflow:
1. Make Changes
↓
2. Run Verification (execute relevant gate)
↓
3. Pass? ──YES──> Continue to Next Task
↓
NO
↓
4. Review Error (identify cause)
↓
5. Fix Issues (apply correction, track strikes)
↓
6. Repeat Step 2 (re-run verification)
Key Principles:
- Run verification frequently (after every meaningful change)
- Fix one issue at a time
- Track strike count per task/gate combination
- Never skip verification
- Small changes = faster feedback
For detailed feedback loop patterns, see feedback-loop-pattern.md.
Using the Verification Scripts
Comprehensive Gate Runner (Bash)
Run all verification gates with 3-strike protocol:
# Run all gates on target
./scripts/run_all_gates.sh src/
# Custom configuration
./scripts/run_all_gates.sh \
--coverage 90 \
--quality-threshold 70 \
src/
# Skip specific gates
./scripts/run_all_gates.sh --skip SECURITY src/
# Run only specific gate
./scripts/run_all_gates.sh --gate LINTING src/
# Verbose output
./scripts/run_all_gates.sh --verbose src/
Configuration via environment variables:
export LINTER_COMMAND="ruff check"
export TEST_COMMAND="pytest"
export COVERAGE_THRESHOLD=80
export QUALITY_GATE_THRESHOLD=60
export SECURITY_SCANNER_COMMAND="bandit"
./scripts/run_all_gates.sh src/
Python Gate Runner
Programmatic API and command-line tool:
# Run all gates
python scripts/gate_runner.py src/
# With options
python scripts/gate_runner.py \
--coverage 85 \
--verbose \
--skip SECURITY \
src/
# JSON output for integration
python scripts/gate_runner.py --json src/ > results.json
Python API usage:
from scripts.gate_runner import GateRunner, VerificationConfig, GateType
# Configure gates
config = VerificationConfig(
linter_command="ruff check",
test_command="pytest",
coverage_threshold=80,
verbose=True
)
# Run verification
runner = GateRunner(config)
results = runner.run_all_gates("src/")
# Check results
for gate, result in results.items():
if result.passed:
print(f"{gate.value}: PASSED")
else:
print(f"{gate.value}: FAILED after {len(result.strikes)} strikes")
Failure Marker Check
Check for existing failures before starting new tasks:
# Check if failure marker exists
./scripts/check_failure_marker.sh
# Exit code 0: No failure, proceed
# Exit code 1: Failure exists, halt all new tasks
Integration in workflows:
#!/bin/bash
# task_runner.sh
# Always check for failure marker first
if ! ./scripts/check_failure_marker.sh; then
echo "Blocked by existing failure. Resolve before continuing."
exit 1
fi
# Proceed with task
echo "No failures detected, proceeding with task..."
# ... task implementation ...
Complete TDD + Verification Workflow
Scenario: Implementing New Feature
# Step 1: Check for existing failures
./scripts/check_failure_marker.sh || exit 1
# Step 2: Create test plan task
echo "Test Plan: User Authentication Feature" > test_plan.md
# ... write comprehensive test scenarios ...
# Step 3: Write tests (TDD RED phase)
# Create test file with failing tests
cat > tests/test_auth.py << 'EOF'
def test_login_valid_credentials():
result = login("user@example.com", "password123")
assert result.success == True
assert result.user_id is not None
def test_login_invalid_credentials():
result = login("user@example.com", "wrongpassword")
assert result.success == False
assert result.error == "Invalid credentials"
EOF
# Step 4: Verify tests fail appropriately (RED)
pytest tests/test_auth.py # Should fail - function doesn't exist yet
# Step 5: Implement functionality (TDD GREEN phase)
cat > src/auth.py << 'EOF'
def login(email, password):
# Implementation here
pass
EOF
# Step 6: Run tests until they pass
pytest tests/test_auth.py # Iteratively fix until GREEN
# Step 7: Run all verification gates
./scripts/run_all_gates.sh src/
# If any gate fails:
# - Strike 1: Review error, simple fix, retry
# - Strike 2: Different approach, retry
# - Strike 3: Failure marker created, halt
# Step 8: All gates pass - feature complete!
Integration with Pre-commit Hooks
Automate verification before commits:
# .git/hooks/pre-commit
#!/bin/bash
echo "Running quality gates before commit..."
# Check for existing failures
if [ -f .quality-gate-failure ]; then
echo "ERROR: Existing quality gate failure. Resolve before committing."
exit 1
fi
# Get staged Python files
STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM | grep '\.py$')
if [ -z "$STAGED_FILES" ]; then
exit 0
fi
# Run verification gates
for file in $STAGED_FILES; do
# Linting
ruff check "$file" || exit 1
done
# Testing (run full suite)
pytest tests/ --cov=src --cov-fail-under=80 || exit 1
# Security scan
bandit -r src/ -ll || exit 1
echo "✓ All quality gates passed"
exit 0
Integration with CI/CD
GitHub Actions Example
name: Quality Gates
on: [push, pull_request]
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install ruff pytest pytest-cov radon bandit
- name: Check for failure marker
run: |
if [ -f .quality-gate-failure ]; then
echo "Failure marker exists"
exit 1
fi
- name: Run Quality Gates
run: |
./scripts/run_all_gates.sh src/
- name: Upload failure marker if exists
if: failure()
uses: actions/upload-artifact@v2
with:
name: failure-marker
path: .quality-gate-failure
Configuration Templates
Project Configuration File
Create .quality-gates.yml in project root:
# .quality-gates.yml
version: 1.0
linting:
command: "ruff check"
config_file: ".ruff.toml"
auto_fix: false
testing:
command: "pytest"
args: "--verbose --cov=src --cov-report=term-missing"
coverage_threshold: 80
quality:
command: "radon cc"
complexity_threshold: 10
maintainability_threshold: 60
security:
command: "bandit"
args: "-r -ll"
severity_level: "high"
exclude:
- "tests/**"
- "docs/**"
failure_handling:
marker_file: ".quality-gate-failure"
max_strikes: 3
auto_retry: false
gates_order:
- LINTING
- TESTING
- COVERAGE
- QUALITY_GATE
- SECURITY
Troubleshooting
Failure Marker Exists
Problem: Script reports failure marker exists and blocks execution
Solution:
# 1. Review failure marker
cat .quality-gate-failure
# 2. Address the root cause issue
# 3. Remove marker after fixing
rm .quality-gate-failure
# 4. Re-run verification
./scripts/run_all_gates.sh src/
Tests Pass Locally but Fail in Gate
Problem: Tests pass when run manually but fail in verification gate
Solution:
- Check environment variables and configuration
- Verify test isolation (no dependencies between tests)
- Check for timing issues or race conditions
- Review test fixtures and setup/teardown
Coverage Below Threshold
Problem: Coverage gate fails due to insufficient test coverage
Solution (Strike 1):
# Generate coverage report to identify gaps
pytest --cov=src --cov-report=html
# Open htmlcov/index.html to see uncovered lines
# Add tests for uncovered code paths
Solution (Strike 2):
- Refactor code to be more testable
- Extract complex functions
- Use dependency injection for mocking
Quality Gate Complexity Too High
Problem: Code complexity exceeds threshold
Solution (Strike 1):
- Extract complex functions into smaller ones
- Simplify conditional logic
- Remove code duplication
Solution (Strike 2):
- Significant refactoring
- Apply design patterns
- Split large classes/modules
Security Vulnerabilities Detected
Problem: Security gate finds vulnerabilities
Solution (Strike 1):
# Update vulnerable dependencies
pip install --upgrade package-name
# Or for npm
npm audit fix
Solution (Strike 2):
- Replace vulnerable library with alternative
- Implement security controls around vulnerability
- Refactor to eliminate vulnerable code pattern
Best Practices
- Always run verification frequently - After every meaningful change
- Create test plan tasks first - Before any implementation (TDD mandate)
- Track strike count - Know when to escalate vs. retry
- Check failure marker - Before starting any new task
- Use incremental changes - Small changes = easier verification
- Automate with hooks - Pre-commit hooks prevent bad commits
- Document strike attempts - Record what was tried and why it failed
- Trust the process - Don't skip gates or work around failures
Quick Reference
Essential Commands
# Check for failures
./scripts/check_failure_marker.sh
# Run all gates
./scripts/run_all_gates.sh src/
# Run specific gate
./scripts/run_all_gates.sh --gate LINTING src/
# Python API
python scripts/gate_runner.py src/
# View results as JSON
python scripts/gate_runner.py --json src/
Gate Configuration Variables
LINTER_COMMAND="ruff check"
TEST_COMMAND="pytest"
QG_CALCULATOR_COMMAND="radon cc"
SECURITY_SCANNER_COMMAND="bandit"
COVERAGE_THRESHOLD=80
QUALITY_GATE_THRESHOLD=60
FAILURE_MARKER_FILE=".quality-gate-failure"
Strike Protocol Summary
- Strike 1 (RETRY): Simple fixes, same approach
- Strike 2 (REMEDY): Different approach, alternative solution
- Strike 3 (HALT): Create marker, stop all tasks, request human help
Reference Documentation
For detailed information on specific topics:
- TDD Mandate - Comprehensive TDD workflow, test plan requirements, Red-Green-Refactor cycle
- 3-Strike Protocol - Detailed failure handling, strike tracking, recovery workflows
- Verification Gates - In-depth gate specifications, tool configurations, success criteria
- Feedback Loop Pattern - Continuous verification workflow, iteration patterns, best practices
Support and Extension
This skill can be extended with:
- Custom verification gates for domain-specific requirements
- Project-specific linter configurations
- Alternative testing frameworks
- Additional security scanning tools
- Custom quality metrics
Modify the scripts in scripts/ or create new ones following the same 3-strike protocol pattern.