name	run-tests
description	Comprehensive pytest testing and debugging framework. Use when running tests, debugging failures, fixing broken tests, or investigating test errors. Includes systematic investigation workflow with external AI tool consultation and verification strategies.

Pytest Testing and Debugging Skill

Overview

This skill provides a systematic approach to running tests and debugging failures using pytest. The core workflow integrates investigation, external tool consultation, and verification to efficiently resolve test failures.

Key capabilities:

Run tests with presets for common scenarios (debug, quick, coverage)
Systematic investigation and hypothesis formation
External AI tool consultation (gemini, codex, cursor-agent) when tests fail
Multi-agent analysis for complex issues
Test discovery and structure analysis

Core Workflow

5-Phase Process:

Run Tests - Execute tests with appropriate flags
Investigate - Analyze failures, form hypothesis
Gather Context - Optionally use code documentation for faster understanding
Consult - Get external tool insights (mandatory for failures if tools available)
Fix & Verify - Implement changes and confirm no regressions

Key principles:

Investigation-first - Always analyze before consulting
Hypothesis-driven - Form theories, then validate
Mandatory consultation for failures - If tests fail and tools exist, consult them
Skip when passing - Tests pass? Done. No consultation needed.

Quick decision guide:

✅ Tests pass? → Done
❌ Simple fix (typo/obvious)? → Fix → Verify
❌ Complex/unclear? → Investigate → Consult → Fix → Verify

Phase 1: Run Tests

Discover Test Structure (Optional)

If unfamiliar with test organization:

# Quick summary
sdd test discover --summary

# Directory tree
sdd test discover --tree

Run Tests

# Quick run (stop on first failure)
sdd test run --quick

# Debug mode (verbose with locals and prints)
sdd test run --debug

# Run specific test
sdd test run tests/test_module.py::test_function

# Coverage report
sdd test run --coverage

# List all presets
sdd test run --list

Or use pytest directly:

pytest -v                # Verbose
pytest -vv -l -s        # Very verbose, show locals, show prints
pytest -x                # Stop on first failure
pytest -k "test_user"   # Run tests matching pattern

Capture Output

For large test suites with many failures:

# Save output to timestamped file
sdd test run --debug | tee /tmp/test-run-$(date +%Y%m%d-%H%M%S).log

Phase 2: Investigate Failures

Categorize the Failure

Assertion - Expected vs actual mismatch
Exception - Runtime errors (AttributeError, KeyError, etc.)
Import - Missing dependencies or module issues
Fixture - Fixture or configuration issues
Timeout - Performance or hanging issues
Flaky - Non-deterministic failures

Extract Key Information

For each failure:

Test file and function name
Line number where failure occurred
Error type and message
Full stack trace
Relevant code context

Examine the Code

Read the failing test
Read the implementation being tested
Understand what the test verifies
Identify expected vs actual behavior
Form your hypothesis - What's causing the failure?

Phase 3: Gather Code Context (Optional)

When available: If codebase documentation exists (generated by Skill(sdd-toolkit:code-doc)), use it for faster investigation.

Check availability:

sdd doc stats

Useful commands when debugging:

# Search for functions or concepts
sdd doc search "authentication"

# Show function definition
sdd doc show-function AuthService.login

# Find dependencies
sdd doc list-dependencies src/services/authService.ts

# Find what depends on a file (impact analysis)
sdd doc dependencies --reverse src/auth.py

Benefits:

Faster context gathering
Better root cause analysis
Discover similar patterns
Impact analysis

If not available: Continue with standard file exploration. See code-doc skill documentation for generating docs.

Phase 4: Consult External Tools

CRITICAL: This is mandatory for test failures when external tools exist.

Check Tool Availability

sdd test check-tools

Decision:

Any tool available AND tests failed → Consult (mandatory)
No tools available → Skip to Phase 5
Tests passed → Skip to Phase 5 (no consultation needed)

Consult Tools

All external tools operate in read-only mode. They analyze and suggest; YOU implement all fixes.

# Auto-route based on failure type
sdd test consult assertion --error "Full error message" --hypothesis "Your theory about the cause"

# Include code for context
sdd test consult exception --error "AttributeError: ..." --hypothesis "Missing return" --test-code tests/test_file.py --impl-code src/module.py

# Show routing matrix
sdd test consult --list-routing

# Manual tool selection
sdd test consult --tool gemini --prompt "Custom question..."

Tool Selection Guide

Tool	Best For	Example Use
Gemini	Hypothesis validation, framework explanations, strategic guidance	"Why is this fixture not found?"
Codex	Code-level review, specific fix suggestions	"Review this code and suggest fixes"
Cursor	Repo-wide discovery, finding patterns	"Find all call sites"

When to Use Multiple Tools

Use multi-agent consultation for:

High-stakes fixes affecting critical functionality
Complex issues with unclear root cause
Need validation from multiple perspectives
Uncertain between multiple approaches

# Auto-selects best two agents
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent

# Select specific agent pair
sdd test consult exception --error "..." --hypothesis "..." --multi-agent --pair code-focus

Effective Prompting

Share your hypothesis - Ask "is my theory correct?" not "what's wrong?"
Provide complete context - Error messages, code, stack traces
Include what you've tried - Show your investigation work
Ask for explanations - Understand "why", not just "how to fix"
Be specific - State exactly what you need

Phase 5: Fix & Verify

Synthesize Findings

Combine insights from:

Your investigation and hypothesis
External tool recommendations
Any additional research

Implement Fix

# Make targeted changes using Edit tool
# Example: Add missing return statement

Verify

# Run the specific fixed test
sdd test run tests/test_module.py::test_function

# If passing, run full suite
sdd test run

# Verify no regressions
pytest tests/ -v

Document

Add comments explaining:

What was wrong
Why the fix works
Any assumptions or limitations

CLI Reference

sdd test check-tools

Check availability of external tools and get routing suggestions.

# Basic check
sdd test check-tools

# Get routing for specific failure type
sdd test check-tools --route assertion
sdd test check-tools --route fixture

sdd test run

Smart pytest runner with presets for common scenarios.

# List all presets
sdd test run --list

# Presets
sdd test run --quick      # Stop on first failure
sdd test run --debug      # Verbose + locals + prints
sdd test run --coverage   # Coverage report
sdd test run --fast       # Skip slow tests
sdd test run --parallel   # Run in parallel

# Run specific test
sdd test run tests/test_file.py::test_name

sdd test consult

External tool consultation with auto-routing.

# Auto-route based on failure type
sdd test consult {assertion|exception|fixture|import|timeout|flaky} --error "..." --hypothesis "..."

# Include code
sdd test consult exception --error "..." --hypothesis "..." --test-code tests/test.py --impl-code src/module.py

# Multi-agent mode
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent

# Manual tool selection
sdd test consult --tool {gemini|codex|cursor} --prompt "..."

# Show routing matrix
sdd test consult --list-routing

# Dry run
sdd test consult fixture --error "..." --hypothesis "..." --dry-run

sdd test discover

Test structure analyzer and discovery.

# Quick summary
sdd test discover --summary

# Directory tree
sdd test discover --tree

# All fixtures
sdd test discover --fixtures

# All markers
sdd test discover --markers

# Detailed analysis
sdd test discover --detailed

# Analyze specific directory
sdd test discover tests/unit --summary

Global Options

Available on all commands:

--no-color - Disable colored output
--verbose, -v - Show detailed output
--quiet, -q - Minimal output (errors only)

Common Patterns

Multiple Failing Tests

Group by error type
Fix one group at a time
Look for common root causes
Consider whether tests need updating vs code needs fixing

Flaky Tests

# Run test multiple times
pytest tests/test_flaky.py --count=10

# Run with random order
pytest --random-order

Fixture Issues

# Show fixture setup and teardown
pytest --setup-show tests/test_module.py

# List available fixtures
pytest --fixtures

Common fixture problems:

Fixture not in conftest.py or test file
Fixture name doesn't match exactly
conftest.py in wrong directory
Incorrect fixture scope

Integration Test Failures

Check in order:

External dependencies
Test environment setup
Database state
Configuration
Network connectivity

Tool Routing Matrix

Quick reference for which tool to use based on failure type:

Failure Type	Primary Tool	Secondary (if needed)	Why
Assertion mismatch	Codex	Gemini	Code-level bug analysis
Exceptions	Codex	Gemini	Precise code review
Import/packaging	Gemini	Cursor	Framework expertise
Fixture issues	Gemini	Cursor	Pytest scoping knowledge
Timeout/performance	Gemini + Cursor	-	Strategy + pattern discovery
Flaky tests	Gemini + Cursor	-	Diagnosis + state dependencies
Multi-file issues	Cursor	Gemini	Discovery + synthesis
Unclear errors	Gemini	Web search	Explanation first

Query type routing:

"Why is this happening?" → Gemini
"Is this code wrong?" → Codex
"Where else does this occur?" → Cursor
"What should I do?" → Gemini + Codex

Special Scenarios

Verification Runs (Confirming Refactors)

When running tests to verify refactoring:

# Run full suite
sdd test run

# If all pass: Done! No consultation needed.
# If tests fail: Follow standard debugging workflow

Key point: Passing verification runs require no consultation. Only investigate failures.

When Tools Disagree

If two tools give different recommendations:

Compare reasoning - Which explanation is more thorough?
Check scope - Which considers broader impact?
Apply critical thinking - Which aligns with your investigation?
Try simplest first - Implement less invasive fix first
Document uncertainty - Note in code comments

When to Escalate to Additional Tools

Use additional tools when:

Answer is unclear or vague
Answer contradicts your analysis
Answer raises new questions
Partial answer (addresses some aspects only)
High-stakes scenario (critical functionality)

Timeout and Retry Behavior

Consultation timeouts:

Default: 90 seconds
Configurable via config.yaml (consultation.timeout_seconds)

When tools time out:

Simplify prompt (remove large code blocks)
Try different tool from routing matrix
Check if tool process is hung: ps aux | grep <tool>
Increase timeout in config if needed

Tool Availability Fallbacks

Recommended	If Unavailable	How to Compensate
Gemini	Codex or Cursor	Ask "why" with extra context; use web search
Codex	Gemini	Ask for very specific code examples
Cursor	Manual Grep + Gemini	Use Grep to find patterns, Gemini to analyze

Advanced Topics

Multi-Agent Analysis

Multi-agent mode consults two agents in parallel and synthesizes their insights:

sdd test consult fixture --error "..." --hypothesis "..." --multi-agent

Output includes:

Consensus points (where agents agree)
Unique insights from each agent
Synthesis combining both analyses
High-confidence recommendations