| name | run-tests |
| description | Comprehensive pytest testing and debugging framework. Use when running tests, debugging failures, fixing broken tests, or investigating test errors. Includes systematic investigation workflow with external AI tool consultation and verification strategies. |
Pytest Testing and Debugging Skill
Overview
This skill provides a systematic approach to running tests and debugging failures using pytest. The core workflow integrates investigation, external tool consultation, and verification to efficiently resolve test failures.
Key capabilities:
- Run tests with presets for common scenarios (debug, quick, coverage)
- Systematic investigation and hypothesis formation
- External AI tool consultation (gemini, codex, cursor-agent) when tests fail
- Multi-agent analysis for complex issues
- Test discovery and structure analysis
Core Workflow
5-Phase Process:
- Run Tests - Execute tests with appropriate flags
- Investigate - Analyze failures, form hypothesis
- Gather Context - Optionally use code documentation for faster understanding
- Consult - Get external tool insights (mandatory for failures if tools available)
- Fix & Verify - Implement changes and confirm no regressions
Key principles:
- Investigation-first - Always analyze before consulting
- Hypothesis-driven - Form theories, then validate
- Mandatory consultation for failures - If tests fail and tools exist, consult them
- Skip when passing - Tests pass? Done. No consultation needed.
Quick decision guide:
- ✅ Tests pass? → Done
- ❌ Simple fix (typo/obvious)? → Fix → Verify
- ❌ Complex/unclear? → Investigate → Consult → Fix → Verify
Phase 1: Run Tests
Discover Test Structure (Optional)
If unfamiliar with test organization:
# Quick summary
sdd test discover --summary
# Directory tree
sdd test discover --tree
Run Tests
# Quick run (stop on first failure)
sdd test run --quick
# Debug mode (verbose with locals and prints)
sdd test run --debug
# Run specific test
sdd test run tests/test_module.py::test_function
# Coverage report
sdd test run --coverage
# List all presets
sdd test run --list
Or use pytest directly:
pytest -v # Verbose
pytest -vv -l -s # Very verbose, show locals, show prints
pytest -x # Stop on first failure
pytest -k "test_user" # Run tests matching pattern
Capture Output
For large test suites with many failures:
# Save output to timestamped file
sdd test run --debug | tee /tmp/test-run-$(date +%Y%m%d-%H%M%S).log
Phase 2: Investigate Failures
Categorize the Failure
- Assertion - Expected vs actual mismatch
- Exception - Runtime errors (AttributeError, KeyError, etc.)
- Import - Missing dependencies or module issues
- Fixture - Fixture or configuration issues
- Timeout - Performance or hanging issues
- Flaky - Non-deterministic failures
Extract Key Information
For each failure:
- Test file and function name
- Line number where failure occurred
- Error type and message
- Full stack trace
- Relevant code context
Examine the Code
- Read the failing test
- Read the implementation being tested
- Understand what the test verifies
- Identify expected vs actual behavior
- Form your hypothesis - What's causing the failure?
Phase 3: Gather Code Context (Optional)
When available: If codebase documentation exists (generated by Skill(sdd-toolkit:code-doc)), use it for faster investigation.
Check availability:
sdd doc stats
Useful commands when debugging:
# Search for functions or concepts
sdd doc search "authentication"
# Show function definition
sdd doc show-function AuthService.login
# Find dependencies
sdd doc list-dependencies src/services/authService.ts
# Find what depends on a file (impact analysis)
sdd doc dependencies --reverse src/auth.py
Benefits:
- Faster context gathering
- Better root cause analysis
- Discover similar patterns
- Impact analysis
If not available: Continue with standard file exploration. See code-doc skill documentation for generating docs.
Phase 4: Consult External Tools
CRITICAL: This is mandatory for test failures when external tools exist.
Check Tool Availability
sdd test check-tools
Decision:
- Any tool available AND tests failed → Consult (mandatory)
- No tools available → Skip to Phase 5
- Tests passed → Skip to Phase 5 (no consultation needed)
Consult Tools
All external tools operate in read-only mode. They analyze and suggest; YOU implement all fixes.
# Auto-route based on failure type
sdd test consult assertion --error "Full error message" --hypothesis "Your theory about the cause"
# Include code for context
sdd test consult exception --error "AttributeError: ..." --hypothesis "Missing return" --test-code tests/test_file.py --impl-code src/module.py
# Show routing matrix
sdd test consult --list-routing
# Manual tool selection
sdd test consult --tool gemini --prompt "Custom question..."
Tool Selection Guide
| Tool | Best For | Example Use |
|---|---|---|
| Gemini | Hypothesis validation, framework explanations, strategic guidance | "Why is this fixture not found?" |
| Codex | Code-level review, specific fix suggestions | "Review this code and suggest fixes" |
| Cursor | Repo-wide discovery, finding patterns | "Find all call sites" |
When to Use Multiple Tools
Use multi-agent consultation for:
- High-stakes fixes affecting critical functionality
- Complex issues with unclear root cause
- Need validation from multiple perspectives
- Uncertain between multiple approaches
# Auto-selects best two agents
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent
# Select specific agent pair
sdd test consult exception --error "..." --hypothesis "..." --multi-agent --pair code-focus
Effective Prompting
- Share your hypothesis - Ask "is my theory correct?" not "what's wrong?"
- Provide complete context - Error messages, code, stack traces
- Include what you've tried - Show your investigation work
- Ask for explanations - Understand "why", not just "how to fix"
- Be specific - State exactly what you need
Phase 5: Fix & Verify
Synthesize Findings
Combine insights from:
- Your investigation and hypothesis
- External tool recommendations
- Any additional research
Implement Fix
# Make targeted changes using Edit tool
# Example: Add missing return statement
Verify
# Run the specific fixed test
sdd test run tests/test_module.py::test_function
# If passing, run full suite
sdd test run
# Verify no regressions
pytest tests/ -v
Document
Add comments explaining:
- What was wrong
- Why the fix works
- Any assumptions or limitations
CLI Reference
sdd test check-tools
Check availability of external tools and get routing suggestions.
# Basic check
sdd test check-tools
# Get routing for specific failure type
sdd test check-tools --route assertion
sdd test check-tools --route fixture
sdd test run
Smart pytest runner with presets for common scenarios.
# List all presets
sdd test run --list
# Presets
sdd test run --quick # Stop on first failure
sdd test run --debug # Verbose + locals + prints
sdd test run --coverage # Coverage report
sdd test run --fast # Skip slow tests
sdd test run --parallel # Run in parallel
# Run specific test
sdd test run tests/test_file.py::test_name
sdd test consult
External tool consultation with auto-routing.
# Auto-route based on failure type
sdd test consult {assertion|exception|fixture|import|timeout|flaky} --error "..." --hypothesis "..."
# Include code
sdd test consult exception --error "..." --hypothesis "..." --test-code tests/test.py --impl-code src/module.py
# Multi-agent mode
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent
# Manual tool selection
sdd test consult --tool {gemini|codex|cursor} --prompt "..."
# Show routing matrix
sdd test consult --list-routing
# Dry run
sdd test consult fixture --error "..." --hypothesis "..." --dry-run
sdd test discover
Test structure analyzer and discovery.
# Quick summary
sdd test discover --summary
# Directory tree
sdd test discover --tree
# All fixtures
sdd test discover --fixtures
# All markers
sdd test discover --markers
# Detailed analysis
sdd test discover --detailed
# Analyze specific directory
sdd test discover tests/unit --summary
Global Options
Available on all commands:
--no-color- Disable colored output--verbose,-v- Show detailed output--quiet,-q- Minimal output (errors only)
Common Patterns
Multiple Failing Tests
- Group by error type
- Fix one group at a time
- Look for common root causes
- Consider whether tests need updating vs code needs fixing
Flaky Tests
# Run test multiple times
pytest tests/test_flaky.py --count=10
# Run with random order
pytest --random-order
Fixture Issues
# Show fixture setup and teardown
pytest --setup-show tests/test_module.py
# List available fixtures
pytest --fixtures
Common fixture problems:
- Fixture not in conftest.py or test file
- Fixture name doesn't match exactly
- conftest.py in wrong directory
- Incorrect fixture scope
Integration Test Failures
Check in order:
- External dependencies
- Test environment setup
- Database state
- Configuration
- Network connectivity
Tool Routing Matrix
Quick reference for which tool to use based on failure type:
| Failure Type | Primary Tool | Secondary (if needed) | Why |
|---|---|---|---|
| Assertion mismatch | Codex | Gemini | Code-level bug analysis |
| Exceptions | Codex | Gemini | Precise code review |
| Import/packaging | Gemini | Cursor | Framework expertise |
| Fixture issues | Gemini | Cursor | Pytest scoping knowledge |
| Timeout/performance | Gemini + Cursor | - | Strategy + pattern discovery |
| Flaky tests | Gemini + Cursor | - | Diagnosis + state dependencies |
| Multi-file issues | Cursor | Gemini | Discovery + synthesis |
| Unclear errors | Gemini | Web search | Explanation first |
Query type routing:
- "Why is this happening?" → Gemini
- "Is this code wrong?" → Codex
- "Where else does this occur?" → Cursor
- "What should I do?" → Gemini + Codex
Special Scenarios
Verification Runs (Confirming Refactors)
When running tests to verify refactoring:
# Run full suite
sdd test run
# If all pass: Done! No consultation needed.
# If tests fail: Follow standard debugging workflow
Key point: Passing verification runs require no consultation. Only investigate failures.
When Tools Disagree
If two tools give different recommendations:
- Compare reasoning - Which explanation is more thorough?
- Check scope - Which considers broader impact?
- Apply critical thinking - Which aligns with your investigation?
- Try simplest first - Implement less invasive fix first
- Document uncertainty - Note in code comments
When to Escalate to Additional Tools
Use additional tools when:
- Answer is unclear or vague
- Answer contradicts your analysis
- Answer raises new questions
- Partial answer (addresses some aspects only)
- High-stakes scenario (critical functionality)
Timeout and Retry Behavior
Consultation timeouts:
- Default: 90 seconds
- Configurable via
config.yaml(consultation.timeout_seconds)
When tools time out:
- Simplify prompt (remove large code blocks)
- Try different tool from routing matrix
- Check if tool process is hung:
ps aux | grep <tool> - Increase timeout in config if needed
Tool Availability Fallbacks
| Recommended | If Unavailable | How to Compensate |
|---|---|---|
| Gemini | Codex or Cursor | Ask "why" with extra context; use web search |
| Codex | Gemini | Ask for very specific code examples |
| Cursor | Manual Grep + Gemini | Use Grep to find patterns, Gemini to analyze |
Advanced Topics
Multi-Agent Analysis
Multi-agent mode consults two agents in parallel and synthesizes their insights:
sdd test consult fixture --error "..." --hypothesis "..." --multi-agent
Output includes:
- Consensus points (where agents agree)
- Unique insights from each agent
- Synthesis combining both analyses
- High-confidence recommendations
Benefits:
- Higher confidence through multiple perspectives
- Better coverage (each agent contributes unique insights)
- Risk reduction (divergent views expose alternatives)
Using pytest-pdb for Debugging
# Drop into debugger on failure
pytest --pdb
# Drop into debugger on first failure
pytest -x --pdb
Custom Markers for Test Organization
# conftest.py
def pytest_configure(config):
config.addinivalue_line("markers", "slow: marks tests as slow")
config.addinivalue_line("markers", "integration: marks integration tests")
config.addinivalue_line("markers", "unit: marks unit tests")
# Usage
@pytest.mark.slow
def test_complex_calculation():
pass
Mocking External Services
from unittest.mock import Mock, patch
def test_api_call():
with patch('requests.get') as mock_get:
mock_get.return_value.json.return_value = {"status": "ok"}
result = fetch_data()
assert result["status"] == "ok"
mock_get.assert_called_once()
Troubleshooting
"Fixture not found"
- Check fixture is defined in conftest.py or same file
- Verify fixture name matches exactly
- Check fixture scope is appropriate
- Ensure conftest.py is in correct directory
"Import error"
- Check PYTHONPATH includes src directory
- Verify
__init__.pyfiles exist - Check for circular imports
- Verify package installed in development mode
"Tests pass locally but fail in CI"
- Check for hardcoded paths
- Verify all dependencies in requirements
- Check for timezone issues
- Look for race conditions
- Check file system differences
"Test is too slow"
- Use fixtures with appropriate scope
- Mock external services
- Use in-memory databases
- Parallelize:
sdd test run --parallel
Best Practices
Running Tests
- Start with verbose mode (
-v) for better visibility - Use
-xto stop on first failure when debugging - Run specific tests to iterate faster
- Use markers to organize test runs
Debugging Strategy
- Read error messages carefully
- Check last line of stack trace first
- Use
-lflag to see local variables - Add temporary print statements for quick debugging
Consultation Workflow
For test failures:
- Do initial investigation first
- Check tool availability:
sdd test check-tools - Consult available tools (mandatory if tests failed)
- Share your hypothesis - don't ask blind questions
- Synthesize insights from tools + your analysis
- YOU implement using Edit/Write tools
- Test thoroughly
Skip consultation when:
- Tests all passed
- Verification/smoke tests succeeded
- Post-fix confirmation (tests already passed once)
- No tools available
Success Criteria
A test debugging session is successful when:
- ✓ All tests pass
- ✓ No new tests are broken
- ✓ Root cause is understood
- ✓ Fix is documented
- ✓ Code is cleaner/clearer than before (when appropriate)