| name | testing-strategy |
| description | Comprehensive guide for implementing AIDB tests following E2E-first philosophy, DebugInterface abstraction, and MCP response health standards |
| version | 1.0.0 |
| tags | testing, e2e, integration, unit, mcp, framework, debugging |
AIDB Testing Strategy
Priority: E2E → Integration → Unit (Highest ROI First)
CRITICAL: Test Execution Command
All tests MUST be run via ./dev-cli test run:
./dev-cli test run -s {suite} [-k 'pattern'] [-l {lang}]
- Multiple
-kand-lflags supported - NEVER use
--local- suites know their natural execution environment; forcing local causes unexpected behavior - Direct
pytestinvocation is NOT supported
This skill guides you through creating and modifying tests for the AIDB project. The test infrastructure is complete - your job is to implement tests using proven patterns.
Related Skills
When implementing tests, you may also need:
- adapter-development - Tests validate adapter behavior across languages
- mcp-tools-development - MCP tools tested with DebugInterface abstraction
- dap-protocol-guide - Tests exercise DAP protocol flows end-to-end
- ci-cd-workflows - For testing CI/CD workflows themselves (not application code)
Core Philosophy
For complete test architecture, see test infrastructure in src/tests/.
1. E2E First, Unit Last
Why E2E First?
- Validates real user workflows
- Exercises the full stack (catches integration bugs early)
- Most bugs discovered when testing actual components
- Higher ROI than unit tests initially
Testing Priority:
- E2E Tests - Complete workflows, real programs, full integration
- Integration Tests - Component interactions, adapter lifecycle, state management
- Unit Tests - Edge cases, specific logic, error handling
2. Framework Tests: Keep It Simple
Goal: Verify we can launch and debug programs using various frameworks
Pattern:
- Connect to framework app (Django, Express, Spring Boot, etc.)
- Quick smoke test: set breakpoint → inspect → step
- That's it. Don't test framework internals.
We're testing that AIDB works WITH frameworks, not testing the frameworks themselves.
3. MCP Responses: Structure + Content + Efficiency
Don't just validate structure - validate content accuracy and payload efficiency.
Bad test:
assert "locals" in response["data"] # Structure only
Good test:
# Structure
assert "locals" in response["data"]
# Content accuracy
assert response["data"]["locals"]["x"]["value"] == 10
assert response["data"]["locals"]["x"]["line"] == 5
# Efficiency (no junk)
assert len(response["data"]) <= 3 # No bloated payloads
assert len(response["summary"]) <= 200 # Concise summaries
4. VS Launch: Critical for Agent Workflows
Why critical:
- Primary entry point for agents using framework debugging
- Enables use of existing workspace launch configs
- Complex variable substitution must work (
${workspaceFolder}, etc.)
Test thoroughly:
- Core launch.json parsing (language-independent)
- Per-language config translation (Python/JavaScript/Java)
- Variable substitution and resolution
- Framework-specific launch configs
Critical Breakpoint Timing:
Set breakpoints when STARTING sessions (not after) to avoid race conditions with fast-executing programs:
# ✅ CORRECT
await debug_interface.start_session(program=prog, breakpoints=[{"line": 10}])
# ❌ WRONG: Race condition
await debug_interface.start_session(program=prog)
await debug_interface.set_breakpoint(file=prog, line=10) # May be too late!
Exception: Long-running processes (servers) where you attach. Reference: src/tests/aidb_shared/e2e/test_complex_workflows.py
DebugInterface Abstraction
The cornerstone of our test strategy is the DebugInterface abstraction - a unified API that works with both MCP tools and the direct API.
For implementation details, see:
- DebugInterface - Skill resource file
src/tests/_interfaces/- Debug interface source and docstrings
Why? One test validates both entry points.
Hypothetical Example (illustrates pattern, not a real test file):
from tests._helpers.parametrization import parametrize_interfaces
class TestBreakpoints(BaseE2ETest):
@parametrize_interfaces # Runs twice: MCP and API
@pytest.mark.asyncio
async def test_set_breakpoint(self, debug_interface, simple_program):
"""Test works with BOTH MCP and API."""
await debug_interface.start_session(program=simple_program)
bp = await debug_interface.set_breakpoint(
file=simple_program,
line=5
)
self.verify_bp.verify_breakpoint_verified(bp)
await debug_interface.stop_session()
Key Points:
@parametrize_interfacesruns test with both MCP and API- Same test logic validates both entry points
- No duplication, no drift
- See DebugInterface for details
The Shared Suite: Testing Debug Fundamentals
The shared suite is AIDB's language-agnostic test foundation that validates core debugging capabilities across all supported languages using normalized, programmatically generated test programs.
Key Innovation: Semantic markers that map identical logic to language-specific line numbers.
Location: src/tests/aidb_shared/ (integration/ + e2e/)
What it tests:
- Debug primitives (breakpoints, stepping, variables)
- Control flow across all 3 languages (Python, JavaScript, Java)
- Zero duplication: One test → 6 execution paths (2 interfaces × 3 languages)
For complete details, see DebugInterface.
When to Use the Shared Suite
Use shared suite when:
- Testing core debug operations (breakpoint, step, inspect)
- Validating adapter behavior across languages
- Ensuring language parity (all adapters work identically)
Use framework tests when:
- Testing framework-specific debugging (Django ORM, Express middleware)
- Validating launch.json configurations
- Testing real-world application patterns
Test Organization
src/tests/
├── aidb_shared/ # ⭐ Shared suite: language-agnostic debug fundamentals
│ ├── integration/ # Core debug operations (breakpoints, stepping, variables)
│ └── e2e/ # Complex workflows, parallel sessions
├── aidb/ # Core API tests - organized by component
│ ├── adapters/ # Adapter-specific tests
│ ├── api/ # Public API tests
│ ├── audit/ # Audit logging tests
│ ├── dap/ # DAP client tests
│ └── session/ # Session management tests
├── aidb_mcp/ # MCP server tests - organized by component
├── frameworks/ # Framework integration tests
│ ├── python/ # Flask, FastAPI, pytest
│ ├── javascript/ # Express, Jest
│ └── java/ # Spring Boot, JUnit
├── _helpers/ # Test helpers and utilities
├── _fixtures/ # Shared fixtures
│ └── unit/ # ⭐ Unit test infrastructure (see below)
└── _assets/ # Test programs and data
├── framework_apps/ # Framework test applications
└── test_programs/ # Generated programs for shared suite
Unit Test Infrastructure
The centralized unit test infrastructure at src/tests/_fixtures/unit/ provides reusable mocks, builders, and fixtures:
_fixtures/unit/
├── builders/ # DAPRequestBuilder, DAPResponseBuilder, DAPEventBuilder
├── dap/ # Transport, events, receiver mocks
├── session/ # Registry, lifecycle, state, child_manager mocks
├── adapter/ # Port, process, launch_orchestrator mocks
├── mcp/ # DebugService, MCPSessionContext mocks
├── conftest.py # Master fixture re-exports
├── context.py # mock_ctx, null_ctx, tmp_storage
└── assertions.py # UnitAssertions class
Usage Pattern:
# In domain conftest.py (e.g., src/tests/aidb/dap/unit/conftest.py)
from tests._fixtures.unit.conftest import * # noqa: F401, F403
from tests._fixtures.unit.builders import DAPEventBuilder, DAPResponseBuilder
# In test file
def test_something(mock_ctx, mock_transport):
event = DAPEventBuilder.stopped_event(reason="breakpoint")
# ...
Key Components:
- Builders - Fluent API for DAP protocol objects (requests, responses, events)
- mock_ctx - Standard logging context mock with debug/info/warning/error methods
- UnitAssertions - DAP-specific assertion helpers
Test Execution Modes
Test suites run in different environments based on their requirements:
Local-Only Suites (no Docker):
cli- CLI command testsmcp- MCP server unit/integration testscore- Core AIDB API testscommon- Common utilities testslogging- Logging framework testsci_cd- CI/CD workflow tests
Docker Suites (require containers):
shared- Multi-language shared tests (parallel language containers)frameworks- Framework integration tests (parallel language containers)launch- Launch config tests (parallel language containers)
Why the split?
- Local suites test Python-only logic (handlers, validation, utils)
- Docker suites test multi-language scenarios
- Multi-language MCP functionality tested in
shared/frameworks/launch
Running tests:
./dev-cli test run -s mcp # Local execution
./dev-cli test run -s shared # Docker execution
Code Reuse: Don't Reinvent
Always use existing infrastructure:
- Test Base Classes -
BaseE2ETest,BaseIntegrationTest,FrameworkDebugTestBase - Parametrization Decorators -
@parametrize_interfaces,@parametrize_languages - Helper Assertions -
self.verify_bp,self.verify_exec,MCPAssertions - Constants -
StopReason,TestTimeouts,MCPTool
For complete details, see E2E Patterns.
Working Examples
Study these real tests before writing new ones:
Framework Tests (E2E)
- Python:
test_flask_debugging.py,test_fastapi_debugging.py,test_pytest_debugging.py - JavaScript:
test_express_debugging.py,test_jest_debugging.py - Java:
test_springboot_debugging.py,test_junit_debugging.py
Core API Tests
- Launch Variable Resolution:
test_launch_variable_resolution.py - Session Target Handling:
test_session_target_handling.py
For complete file paths and patterns, see E2E Patterns.
Common Patterns
For hypothetical examples illustrating common patterns, see E2E Patterns.
Key patterns covered:
- Basic E2E Test
- Breakpoint Test with Markers
- Dual-Launch Equivalence Test
- MCP Response Validation
When Creating New Tests
Step 1: Choose Test Type
- E2E? Full workflow, real program, complete integration
- Integration? Component interactions, lifecycle management
- Unit? Specific function, edge case, error handling
Step 2: Find Similar Test
Look at E2E Patterns:
- Django/Express for framework tests
- Existing tests in the same directory
- Similar test scenarios in other languages
Step 3: Copy Pattern, Adapt
Don't start from scratch:
- Copy a working test
- Adapt to your scenario
- Use same helpers and assertions
- Follow same structure
Step 4: Use Existing Infrastructure
Don't create:
- New assertion helpers (use existing)
- New fixtures (check
conftest.pyfiles first) - New constants (use
constants.py) - New base classes (inherit from existing)
Do create:
- Tests using existing patterns
- Scenario-specific test data
- Framework-specific fixtures (if needed)
Performance Testing
Current State: No performance baselines exist yet
Phase 1: Establish baselines
- Analyze existing metrics/instrumentation
- Determine "healthy" latencies
- Document target times
Phase 2: Regression testing
- Monitor key operations (breakpoint set, variable inspect, step)
- Alert on degradation
For now: Focus on functional correctness, not performance.
Success Criteria
Test Quality Checklist
- Test uses
@parametrize_interfacesfor MCP/API coverage - Test inherits from appropriate base class
- Test uses helper assertions, not custom assertions
- Test validates content accuracy, not just structure
- MCP tests check efficiency (no bloated payloads)
- Test has clear docstring explaining what it validates
- Test follows working examples
- Test is in correct directory (e2e/integration/unit)
Framework Test Checklist
- Inherits from
FrameworkDebugTestBase - Implements
test_launch_via_api() - Implements
test_launch_via_vscode_config() - Implements
test_dual_launch_equivalence() - Sets
framework_nameattribute - Uses simple smoke tests (no deep framework testing)
Investigating Test Failures
CRITICAL: When tests fail, check logs BEFORE attempting fixes.
See: Debugging Failures for log locations, investigation workflow, and common patterns.
For CI test failures, use the ci-cd-workflows skill's troubleshooting guide.
Resources
| Resource | Content |
|---|---|
| E2E Patterns | Test patterns, markers, code reuse, working examples |
| Framework Tests | Dual-launch pattern, Flask/Express examples |
| DebugInterface | Unified API abstraction, shared suite architecture |
| Debugging Failures | Log locations, investigation workflow, common issues |
Test Infrastructure: src/tests/ (see _interfaces/, _fixtures/, _helpers/ for core components)
Getting Started
- Read CONTEXT.md:
wip/test-implementation-backlog/CONTEXT.md - Study working examples: Flask (
test_flask_debugging.py) and Express (test_express_debugging.py) - Choose a test to implement: Start with E2E (highest ROI)
- Copy a working test: Don't start from scratch
- Adapt to your scenario: Use same patterns, different data
- Validate: Run test, ensure it passes with both MCP and API
Questions?
Internal Documentation:
src/tests/- Test infrastructure (see _interfaces/, _fixtures/, _helpers/)docs/developer-guide/overview.md- System architecture
Code References:
- DAP Protocol: See
src/aidb/dap/protocol/(fully typed, types.py + requests.py + responses.py + events.py) - Test Infrastructure: See
src/tests/_helpers/andsrc/tests/_fixtures/ - Working Examples: See Flask/Express framework tests
Remember:
- E2E first, validate content accuracy
- Use shared suite for debug fundamentals, framework tests for integration
- Keep framework tests simple (no framework internals)
- Always use the DebugInterface abstraction (zero duplication)