Claude Code Plugins

Community-maintained marketplace

Feedback

Comprehensive pytest testing and debugging framework. Use when running tests, debugging failures, fixing broken tests, or investigating test errors. Includes systematic investigation workflow with external AI tool consultation and verification strategies.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name run-tests
description Comprehensive pytest testing and debugging framework. Use when running tests, debugging failures, fixing broken tests, or investigating test errors. Includes systematic investigation workflow with external AI tool consultation and verification strategies.

Pytest Testing and Debugging Skill

Overview

This skill provides a systematic approach to running tests and debugging failures using pytest. The core workflow integrates investigation, external tool consultation, and verification to efficiently resolve test failures.

Key capabilities:

  • Run tests with presets for common scenarios (debug, quick, coverage)
  • Systematic investigation and hypothesis formation
  • External AI tool consultation (gemini, codex, cursor-agent) when tests fail
  • Multi-agent analysis for complex issues
  • Test discovery and structure analysis

Core Workflow

5-Phase Process:

  1. Run Tests - Execute tests with appropriate flags
  2. Investigate - Analyze failures, form hypothesis
  3. Gather Context - Optionally use code documentation for faster understanding
  4. Consult - Get external tool insights (mandatory for failures if tools available)
  5. Fix & Verify - Implement changes and confirm no regressions

Key principles:

  • Investigation-first - Always analyze before consulting
  • Hypothesis-driven - Form theories, then validate
  • Mandatory consultation for failures - If tests fail and tools exist, consult them
  • Skip when passing - Tests pass? Done. No consultation needed.

Quick decision guide:

  • ✅ Tests pass? → Done
  • ❌ Simple fix (typo/obvious)? → Fix → Verify
  • ❌ Complex/unclear? → Investigate → Consult → Fix → Verify

Phase 1: Run Tests

Discover Test Structure (Optional)

If unfamiliar with test organization:

# Quick summary
sdd test discover --summary

# Directory tree
sdd test discover --tree

Run Tests

# Quick run (stop on first failure)
sdd test run --quick

# Debug mode (verbose with locals and prints)
sdd test run --debug

# Run specific test
sdd test run tests/test_module.py::test_function

# Coverage report
sdd test run --coverage

# List all presets
sdd test run --list

Or use pytest directly:

pytest -v                # Verbose
pytest -vv -l -s        # Very verbose, show locals, show prints
pytest -x                # Stop on first failure
pytest -k "test_user"   # Run tests matching pattern

Capture Output

For large test suites with many failures:

# Save output to timestamped file
sdd test run --debug | tee /tmp/test-run-$(date +%Y%m%d-%H%M%S).log

Phase 2: Investigate Failures

Categorize the Failure

  • Assertion - Expected vs actual mismatch
  • Exception - Runtime errors (AttributeError, KeyError, etc.)
  • Import - Missing dependencies or module issues
  • Fixture - Fixture or configuration issues
  • Timeout - Performance or hanging issues
  • Flaky - Non-deterministic failures

Extract Key Information

For each failure:

  • Test file and function name
  • Line number where failure occurred
  • Error type and message
  • Full stack trace
  • Relevant code context

Examine the Code

  • Read the failing test
  • Read the implementation being tested
  • Understand what the test verifies
  • Identify expected vs actual behavior
  • Form your hypothesis - What's causing the failure?

Phase 3: Gather Code Context (Optional)

When available: If codebase documentation exists (generated by Skill(sdd-toolkit:code-doc)), use it for faster investigation.

Check availability:

sdd doc stats

Useful commands when debugging:

# Search for functions or concepts
sdd doc search "authentication"

# Show function definition
sdd doc show-function AuthService.login

# Find dependencies
sdd doc list-dependencies src/services/authService.ts

# Find what depends on a file (impact analysis)
sdd doc dependencies --reverse src/auth.py

Benefits:

  • Faster context gathering
  • Better root cause analysis
  • Discover similar patterns
  • Impact analysis

If not available: Continue with standard file exploration. See code-doc skill documentation for generating docs.

Phase 4: Consult External Tools

CRITICAL: This is mandatory for test failures when external tools exist.

Check Tool Availability

sdd test check-tools

Decision:

  • Any tool available AND tests failed → Consult (mandatory)
  • No tools available → Skip to Phase 5
  • Tests passed → Skip to Phase 5 (no consultation needed)

Consult Tools

All external tools operate in read-only mode. They analyze and suggest; YOU implement all fixes.

# Auto-route based on failure type
sdd test consult assertion --error "Full error message" --hypothesis "Your theory about the cause"

# Include code for context
sdd test consult exception --error "AttributeError: ..." --hypothesis "Missing return" --test-code tests/test_file.py --impl-code src/module.py

# Show routing matrix
sdd test consult --list-routing

# Manual tool selection
sdd test consult --tool gemini --prompt "Custom question..."

Tool Selection Guide

Tool Best For Example Use
Gemini Hypothesis validation, framework explanations, strategic guidance "Why is this fixture not found?"
Codex Code-level review, specific fix suggestions "Review this code and suggest fixes"
Cursor Repo-wide discovery, finding patterns "Find all call sites"

When to Use Multiple Tools

Use multi-agent consultation for:

  • High-stakes fixes affecting critical functionality
  • Complex issues with unclear root cause
  • Need validation from multiple perspectives
  • Uncertain between multiple approaches
# Auto-selects best two agents
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent

# Select specific agent pair
sdd test consult exception --error "..." --hypothesis "..." --multi-agent --pair code-focus

Effective Prompting

  1. Share your hypothesis - Ask "is my theory correct?" not "what's wrong?"
  2. Provide complete context - Error messages, code, stack traces
  3. Include what you've tried - Show your investigation work
  4. Ask for explanations - Understand "why", not just "how to fix"
  5. Be specific - State exactly what you need

Phase 5: Fix & Verify

Synthesize Findings

Combine insights from:

  • Your investigation and hypothesis
  • External tool recommendations
  • Any additional research

Implement Fix

# Make targeted changes using Edit tool
# Example: Add missing return statement

Verify

# Run the specific fixed test
sdd test run tests/test_module.py::test_function

# If passing, run full suite
sdd test run

# Verify no regressions
pytest tests/ -v

Document

Add comments explaining:

  • What was wrong
  • Why the fix works
  • Any assumptions or limitations

CLI Reference

sdd test check-tools

Check availability of external tools and get routing suggestions.

# Basic check
sdd test check-tools

# Get routing for specific failure type
sdd test check-tools --route assertion
sdd test check-tools --route fixture

sdd test run

Smart pytest runner with presets for common scenarios.

# List all presets
sdd test run --list

# Presets
sdd test run --quick      # Stop on first failure
sdd test run --debug      # Verbose + locals + prints
sdd test run --coverage   # Coverage report
sdd test run --fast       # Skip slow tests
sdd test run --parallel   # Run in parallel

# Run specific test
sdd test run tests/test_file.py::test_name

sdd test consult

External tool consultation with auto-routing.

# Auto-route based on failure type
sdd test consult {assertion|exception|fixture|import|timeout|flaky} --error "..." --hypothesis "..."

# Include code
sdd test consult exception --error "..." --hypothesis "..." --test-code tests/test.py --impl-code src/module.py

# Multi-agent mode
sdd test consult assertion --error "..." --hypothesis "..." --multi-agent

# Manual tool selection
sdd test consult --tool {gemini|codex|cursor} --prompt "..."

# Show routing matrix
sdd test consult --list-routing

# Dry run
sdd test consult fixture --error "..." --hypothesis "..." --dry-run

sdd test discover

Test structure analyzer and discovery.

# Quick summary
sdd test discover --summary

# Directory tree
sdd test discover --tree

# All fixtures
sdd test discover --fixtures

# All markers
sdd test discover --markers

# Detailed analysis
sdd test discover --detailed

# Analyze specific directory
sdd test discover tests/unit --summary

Global Options

Available on all commands:

  • --no-color - Disable colored output
  • --verbose, -v - Show detailed output
  • --quiet, -q - Minimal output (errors only)

Common Patterns

Multiple Failing Tests

  1. Group by error type
  2. Fix one group at a time
  3. Look for common root causes
  4. Consider whether tests need updating vs code needs fixing

Flaky Tests

# Run test multiple times
pytest tests/test_flaky.py --count=10

# Run with random order
pytest --random-order

Fixture Issues

# Show fixture setup and teardown
pytest --setup-show tests/test_module.py

# List available fixtures
pytest --fixtures

Common fixture problems:

  • Fixture not in conftest.py or test file
  • Fixture name doesn't match exactly
  • conftest.py in wrong directory
  • Incorrect fixture scope

Integration Test Failures

Check in order:

  1. External dependencies
  2. Test environment setup
  3. Database state
  4. Configuration
  5. Network connectivity

Tool Routing Matrix

Quick reference for which tool to use based on failure type:

Failure Type Primary Tool Secondary (if needed) Why
Assertion mismatch Codex Gemini Code-level bug analysis
Exceptions Codex Gemini Precise code review
Import/packaging Gemini Cursor Framework expertise
Fixture issues Gemini Cursor Pytest scoping knowledge
Timeout/performance Gemini + Cursor - Strategy + pattern discovery
Flaky tests Gemini + Cursor - Diagnosis + state dependencies
Multi-file issues Cursor Gemini Discovery + synthesis
Unclear errors Gemini Web search Explanation first

Query type routing:

  • "Why is this happening?" → Gemini
  • "Is this code wrong?" → Codex
  • "Where else does this occur?" → Cursor
  • "What should I do?" → Gemini + Codex

Special Scenarios

Verification Runs (Confirming Refactors)

When running tests to verify refactoring:

# Run full suite
sdd test run

# If all pass: Done! No consultation needed.
# If tests fail: Follow standard debugging workflow

Key point: Passing verification runs require no consultation. Only investigate failures.

When Tools Disagree

If two tools give different recommendations:

  1. Compare reasoning - Which explanation is more thorough?
  2. Check scope - Which considers broader impact?
  3. Apply critical thinking - Which aligns with your investigation?
  4. Try simplest first - Implement less invasive fix first
  5. Document uncertainty - Note in code comments

When to Escalate to Additional Tools

Use additional tools when:

  • Answer is unclear or vague
  • Answer contradicts your analysis
  • Answer raises new questions
  • Partial answer (addresses some aspects only)
  • High-stakes scenario (critical functionality)

Timeout and Retry Behavior

Consultation timeouts:

  • Default: 90 seconds
  • Configurable via config.yaml (consultation.timeout_seconds)

When tools time out:

  1. Simplify prompt (remove large code blocks)
  2. Try different tool from routing matrix
  3. Check if tool process is hung: ps aux | grep <tool>
  4. Increase timeout in config if needed

Tool Availability Fallbacks

Recommended If Unavailable How to Compensate
Gemini Codex or Cursor Ask "why" with extra context; use web search
Codex Gemini Ask for very specific code examples
Cursor Manual Grep + Gemini Use Grep to find patterns, Gemini to analyze

Advanced Topics

Multi-Agent Analysis

Multi-agent mode consults two agents in parallel and synthesizes their insights:

sdd test consult fixture --error "..." --hypothesis "..." --multi-agent

Output includes:

  • Consensus points (where agents agree)
  • Unique insights from each agent
  • Synthesis combining both analyses
  • High-confidence recommendations

Benefits:

  • Higher confidence through multiple perspectives
  • Better coverage (each agent contributes unique insights)
  • Risk reduction (divergent views expose alternatives)

Using pytest-pdb for Debugging

# Drop into debugger on failure
pytest --pdb

# Drop into debugger on first failure
pytest -x --pdb

Custom Markers for Test Organization

# conftest.py
def pytest_configure(config):
    config.addinivalue_line("markers", "slow: marks tests as slow")
    config.addinivalue_line("markers", "integration: marks integration tests")
    config.addinivalue_line("markers", "unit: marks unit tests")

# Usage
@pytest.mark.slow
def test_complex_calculation():
    pass

Mocking External Services

from unittest.mock import Mock, patch

def test_api_call():
    with patch('requests.get') as mock_get:
        mock_get.return_value.json.return_value = {"status": "ok"}
        result = fetch_data()
        assert result["status"] == "ok"
        mock_get.assert_called_once()

Troubleshooting

"Fixture not found"

  1. Check fixture is defined in conftest.py or same file
  2. Verify fixture name matches exactly
  3. Check fixture scope is appropriate
  4. Ensure conftest.py is in correct directory

"Import error"

  1. Check PYTHONPATH includes src directory
  2. Verify __init__.py files exist
  3. Check for circular imports
  4. Verify package installed in development mode

"Tests pass locally but fail in CI"

  1. Check for hardcoded paths
  2. Verify all dependencies in requirements
  3. Check for timezone issues
  4. Look for race conditions
  5. Check file system differences

"Test is too slow"

  1. Use fixtures with appropriate scope
  2. Mock external services
  3. Use in-memory databases
  4. Parallelize: sdd test run --parallel

Best Practices

Running Tests

  1. Start with verbose mode (-v) for better visibility
  2. Use -x to stop on first failure when debugging
  3. Run specific tests to iterate faster
  4. Use markers to organize test runs

Debugging Strategy

  1. Read error messages carefully
  2. Check last line of stack trace first
  3. Use -l flag to see local variables
  4. Add temporary print statements for quick debugging

Consultation Workflow

For test failures:

  1. Do initial investigation first
  2. Check tool availability: sdd test check-tools
  3. Consult available tools (mandatory if tests failed)
  4. Share your hypothesis - don't ask blind questions
  5. Synthesize insights from tools + your analysis
  6. YOU implement using Edit/Write tools
  7. Test thoroughly

Skip consultation when:

  • Tests all passed
  • Verification/smoke tests succeeded
  • Post-fix confirmation (tests already passed once)
  • No tools available

Success Criteria

A test debugging session is successful when:

  • ✓ All tests pass
  • ✓ No new tests are broken
  • ✓ Root cause is understood
  • ✓ Fix is documented
  • ✓ Code is cleaner/clearer than before (when appropriate)