name	testing-strategy
description	Comprehensive guide for implementing AIDB tests following E2E-first philosophy, DebugInterface abstraction, and MCP response health standards
version	1.0.0
tags	testing, e2e, integration, unit, mcp, framework, debugging

AIDB Testing Strategy

Priority: E2E → Integration → Unit (Highest ROI First)

CRITICAL: Test Execution Command

All tests MUST be run via ./dev-cli test run:

./dev-cli test run -s {suite} [-k 'pattern'] [-l {lang}]

Multiple -k and -l flags supported
NEVER use --local - suites know their natural execution environment; forcing local causes unexpected behavior
Direct pytest invocation is NOT supported

This skill guides you through creating and modifying tests for the AIDB project. The test infrastructure is complete - your job is to implement tests using proven patterns.

Related Skills

When implementing tests, you may also need:

adapter-development - Tests validate adapter behavior across languages
mcp-tools-development - MCP tools tested with DebugInterface abstraction
dap-protocol-guide - Tests exercise DAP protocol flows end-to-end
ci-cd-workflows - For testing CI/CD workflows themselves (not application code)

Core Philosophy

For complete test architecture, see test infrastructure in src/tests/.

1. E2E First, Unit Last

Why E2E First?

Validates real user workflows
Exercises the full stack (catches integration bugs early)
Most bugs discovered when testing actual components
Higher ROI than unit tests initially

Testing Priority:

E2E Tests - Complete workflows, real programs, full integration
Integration Tests - Component interactions, adapter lifecycle, state management
Unit Tests - Edge cases, specific logic, error handling

2. Framework Tests: Keep It Simple

Goal: Verify we can launch and debug programs using various frameworks

Pattern:

Connect to framework app (Django, Express, Spring Boot, etc.)
Quick smoke test: set breakpoint → inspect → step
That's it. Don't test framework internals.

We're testing that AIDB works WITH frameworks, not testing the frameworks themselves.

3. MCP Responses: Structure + Content + Efficiency

Don't just validate structure - validate content accuracy and payload efficiency.

Bad test:

assert "locals" in response["data"]  # Structure only

Good test:

# Structure
assert "locals" in response["data"]

# Content accuracy
assert response["data"]["locals"]["x"]["value"] == 10
assert response["data"]["locals"]["x"]["line"] == 5

# Efficiency (no junk)
assert len(response["data"]) <= 3  # No bloated payloads
assert len(response["summary"]) <= 200  # Concise summaries

4. VS Launch: Critical for Agent Workflows

Why critical:

Primary entry point for agents using framework debugging
Enables use of existing workspace launch configs
Complex variable substitution must work (${workspaceFolder}, etc.)

Test thoroughly:

Core launch.json parsing (language-independent)
Per-language config translation (Python/JavaScript/Java)
Variable substitution and resolution
Framework-specific launch configs

Critical Breakpoint Timing:

Set breakpoints when STARTING sessions (not after) to avoid race conditions with fast-executing programs:

# ✅ CORRECT
await debug_interface.start_session(program=prog, breakpoints=[{"line": 10}])

# ❌ WRONG: Race condition
await debug_interface.start_session(program=prog)
await debug_interface.set_breakpoint(file=prog, line=10)  # May be too late!

Exception: Long-running processes (servers) where you attach. Reference: src/tests/aidb_shared/e2e/test_complex_workflows.py

DebugInterface Abstraction

The cornerstone of our test strategy is the DebugInterface abstraction - a unified API that works with both MCP tools and the direct API.

For implementation details, see:

DebugInterface - Skill resource file
src/tests/_interfaces/ - Debug interface source and docstrings

Why? One test validates both entry points.

Hypothetical Example (illustrates pattern, not a real test file):

from tests._helpers.parametrization import parametrize_interfaces

class TestBreakpoints(BaseE2ETest):
    @parametrize_interfaces  # Runs twice: MCP and API
    @pytest.mark.asyncio
    async def test_set_breakpoint(self, debug_interface, simple_program):
        """Test works with BOTH MCP and API."""
        await debug_interface.start_session(program=simple_program)

        bp = await debug_interface.set_breakpoint(
            file=simple_program,
            line=5
        )

        self.verify_bp.verify_breakpoint_verified(bp)
        await debug_interface.stop_session()

Key Points:

@parametrize_interfaces runs test with both MCP and API
Same test logic validates both entry points
No duplication, no drift
See DebugInterface for details

The Shared Suite: Testing Debug Fundamentals

The shared suite is AIDB's language-agnostic test foundation that validates core debugging capabilities across all supported languages using normalized, programmatically generated test programs.

Key Innovation: Semantic markers that map identical logic to language-specific line numbers.

Location: src/tests/aidb_shared/ (integration/ + e2e/)

What it tests:

Debug primitives (breakpoints, stepping, variables)
Control flow across all 3 languages (Python, JavaScript, Java)
Zero duplication: One test → 6 execution paths (2 interfaces × 3 languages)

For complete details, see DebugInterface.

When to Use the Shared Suite

Use shared suite when:

Testing core debug operations (breakpoint, step, inspect)
Validating adapter behavior across languages
Ensuring language parity (all adapters work identically)

Use framework tests when:

Testing framework-specific debugging (Django ORM, Express middleware)
Validating launch.json configurations
Testing real-world application patterns

Test Organization

src/tests/
├── aidb_shared/               # ⭐ Shared suite: language-agnostic debug fundamentals
│   ├── integration/          # Core debug operations (breakpoints, stepping, variables)
│   └── e2e/                  # Complex workflows, parallel sessions
├── aidb/                      # Core API tests - organized by component
│   ├── adapters/             # Adapter-specific tests
│   ├── api/                  # Public API tests
│   ├── audit/                # Audit logging tests
│   ├── dap/                  # DAP client tests
│   └── session/              # Session management tests
├── aidb_mcp/                  # MCP server tests - organized by component
├── frameworks/                # Framework integration tests
│   ├── python/               # Flask, FastAPI, pytest
│   ├── javascript/           # Express, Jest
│   └── java/                 # Spring Boot, JUnit
├── _helpers/                  # Test helpers and utilities
├── _fixtures/                 # Shared fixtures
│   └── unit/                 # ⭐ Unit test infrastructure (see below)
└── _assets/                   # Test programs and data
    ├── framework_apps/       # Framework test applications
    └── test_programs/        # Generated programs for shared suite

Unit Test Infrastructure

The centralized unit test infrastructure at src/tests/_fixtures/unit/ provides reusable mocks, builders, and fixtures:

_fixtures/unit/
├── builders/           # DAPRequestBuilder, DAPResponseBuilder, DAPEventBuilder
├── dap/               # Transport, events, receiver mocks
├── session/           # Registry, lifecycle, state, child_manager mocks
├── adapter/           # Port, process, launch_orchestrator mocks
├── mcp/               # DebugService, MCPSessionContext mocks
├── conftest.py        # Master fixture re-exports
├── context.py         # mock_ctx, null_ctx, tmp_storage
└── assertions.py      # UnitAssertions class

Usage Pattern:

# In domain conftest.py (e.g., src/tests/aidb/dap/unit/conftest.py)
from tests._fixtures.unit.conftest import *  # noqa: F401, F403
from tests._fixtures.unit.builders import DAPEventBuilder, DAPResponseBuilder

# In test file
def test_something(mock_ctx, mock_transport):
    event = DAPEventBuilder.stopped_event(reason="breakpoint")
    # ...

Key Components:

Builders - Fluent API for DAP protocol objects (requests, responses, events)
mock_ctx - Standard logging context mock with debug/info/warning/error methods
UnitAssertions - DAP-specific assertion helpers

Test Execution Modes

Test suites run in different environments based on their requirements:

Local-Only Suites (no Docker):

cli - CLI command tests
mcp - MCP server unit/integration tests
core - Core AIDB API tests
common - Common utilities tests
logging - Logging framework tests
ci_cd - CI/CD workflow tests

Docker Suites (require containers):

shared - Multi-language shared tests (parallel language containers)
frameworks - Framework integration tests (parallel language containers)
launch - Launch config tests (parallel language containers)

Why the split?

Local suites test Python-only logic (handlers, validation, utils)
Docker suites test multi-language scenarios
Multi-language MCP functionality tested in shared/frameworks/launch

Running tests:

./dev-cli test run -s mcp      # Local execution
./dev-cli test run -s shared   # Docker execution

Code Reuse: Don't Reinvent

Always use existing infrastructure:

Test Base Classes - BaseE2ETest, BaseIntegrationTest, FrameworkDebugTestBase
Parametrization Decorators - @parametrize_interfaces, @parametrize_languages
Helper Assertions - self.verify_bp, self.verify_exec, MCPAssertions
Constants - StopReason, TestTimeouts, MCPTool

For complete details, see E2E Patterns.

Working Examples

Study these real tests before writing new ones:

Framework Tests (E2E)

Python: test_flask_debugging.py, test_fastapi_debugging.py, test_pytest_debugging.py
JavaScript: test_express_debugging.py, test_jest_debugging.py
Java: test_springboot_debugging.py, test_junit_debugging.py

Core API Tests

Launch Variable Resolution: test_launch_variable_resolution.py
Session Target Handling: test_session_target_handling.py

For complete file paths and patterns, see E2E Patterns.

Common Patterns

For hypothetical examples illustrating common patterns, see E2E Patterns.

Key patterns covered:

Basic E2E Test
Breakpoint Test with Markers
Dual-Launch Equivalence Test
MCP Response Validation

When Creating New Tests

Step 1: Choose Test Type

E2E? Full workflow, real program, complete integration
Integration? Component interactions, lifecycle management
Unit? Specific function, edge case, error handling

Step 2: Find Similar Test

Look at E2E Patterns:

Django/Express for framework tests
Existing tests in the same directory
Similar test scenarios in other languages

Step 3: Copy Pattern, Adapt

Don't start from scratch:

Copy a working test
Adapt to your scenario
Use same helpers and assertions
Follow same structure

Step 4: Use Existing Infrastructure

Don't create:

New assertion helpers (use existing)
New fixtures (check conftest.py files first)
New constants (use constants.py)
New base classes (inherit from existing)

Do create:

Tests using existing patterns
Scenario-specific test data
Framework-specific fixtures (if needed)

Performance Testing

Current State: No performance baselines exist yet

Phase 1: Establish baselines

Analyze existing metrics/instrumentation
Determine "healthy" latencies
Document target times

Phase 2: Regression testing

Monitor key operations (breakpoint set, variable inspect, step)
Alert on degradation

For now: Focus on functional correctness, not performance.

Success Criteria

Test Quality Checklist

Test uses @parametrize_interfaces for MCP/API coverage
Test inherits from appropriate base class
Test uses helper assertions, not custom assertions
Test validates content accuracy, not just structure
MCP tests check efficiency (no bloated payloads)
Test has clear docstring explaining what it validates
Test follows working examples
Test is in correct directory (e2e/integration/unit)

Framework Test Checklist

Inherits from FrameworkDebugTestBase
Implements test_launch_via_api()
Implements test_launch_via_vscode_config()
Implements test_dual_launch_equivalence()
Sets framework_name attribute
Uses simple smoke tests (no deep framework testing)

Investigating Test Failures

CRITICAL: When tests fail, check logs BEFORE attempting fixes.

See: Debugging Failures for log locations, investigation workflow, and common patterns.

For CI test failures, use the ci-cd-workflows skill's troubleshooting guide.

Resources

Resource	Content
E2E Patterns	Test patterns, markers, code reuse, working examples
Framework Tests	Dual-launch pattern, Flask/Express examples
DebugInterface	Unified API abstraction, shared suite architecture
Debugging Failures	Log locations, investigation workflow, common issues

Test Infrastructure: src/tests/ (see _interfaces/, _fixtures/, _helpers/ for core components)

Getting Started

Read CONTEXT.md: wip/test-implementation-backlog/CONTEXT.md
Study working examples: Flask (test_flask_debugging.py) and Express (test_express_debugging.py)
Choose a test to implement: Start with E2E (highest ROI)
Copy a working test: Don't start from scratch
Adapt to your scenario: Use same patterns, different data
Validate: Run test, ensure it passes with both MCP and API

Questions?

Internal Documentation:

src/tests/ - Test infrastructure (see _interfaces/, _fixtures/, _helpers/)
docs/developer-guide/overview.md - System architecture

Code References:

DAP Protocol: See src/aidb/dap/protocol/ (fully typed, types.py + requests.py + responses.py + events.py)
Test Infrastructure: See src/tests/_helpers/ and src/tests/_fixtures/
Working Examples: See Flask/Express framework tests

Remember:

E2E first, validate content accuracy
Use shared suite for debug fundamentals, framework tests for integration
Keep framework tests simple (no framework internals)
Always use the DebugInterface abstraction (zero duplication)

testing-strategy

Install Skill

SKILL.md