Claude Code Plugins

Community-maintained marketplace

Feedback

testing-strategy

@ai-debugger-inc/aidb
1
0

Comprehensive guide for implementing AIDB tests following E2E-first philosophy,

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name testing-strategy
description Comprehensive guide for implementing AIDB tests following E2E-first philosophy, DebugInterface abstraction, and MCP response health standards
version 1.0.0
tags testing, e2e, integration, unit, mcp, framework, debugging

AIDB Testing Strategy

Priority: E2E → Integration → Unit (Highest ROI First)


CRITICAL: Test Execution Command

All tests MUST be run via ./dev-cli test run:

./dev-cli test run -s {suite} [-k 'pattern'] [-l {lang}]
  • Multiple -k and -l flags supported
  • NEVER use --local - suites know their natural execution environment; forcing local causes unexpected behavior
  • Direct pytest invocation is NOT supported

This skill guides you through creating and modifying tests for the AIDB project. The test infrastructure is complete - your job is to implement tests using proven patterns.

Related Skills

When implementing tests, you may also need:

  • adapter-development - Tests validate adapter behavior across languages
  • mcp-tools-development - MCP tools tested with DebugInterface abstraction
  • dap-protocol-guide - Tests exercise DAP protocol flows end-to-end
  • ci-cd-workflows - For testing CI/CD workflows themselves (not application code)

Core Philosophy

For complete test architecture, see test infrastructure in src/tests/.

1. E2E First, Unit Last

Why E2E First?

  • Validates real user workflows
  • Exercises the full stack (catches integration bugs early)
  • Most bugs discovered when testing actual components
  • Higher ROI than unit tests initially

Testing Priority:

  1. E2E Tests - Complete workflows, real programs, full integration
  2. Integration Tests - Component interactions, adapter lifecycle, state management
  3. Unit Tests - Edge cases, specific logic, error handling

2. Framework Tests: Keep It Simple

Goal: Verify we can launch and debug programs using various frameworks

Pattern:

  1. Connect to framework app (Django, Express, Spring Boot, etc.)
  2. Quick smoke test: set breakpoint → inspect → step
  3. That's it. Don't test framework internals.

We're testing that AIDB works WITH frameworks, not testing the frameworks themselves.

3. MCP Responses: Structure + Content + Efficiency

Don't just validate structure - validate content accuracy and payload efficiency.

Bad test:

assert "locals" in response["data"]  # Structure only

Good test:

# Structure
assert "locals" in response["data"]

# Content accuracy
assert response["data"]["locals"]["x"]["value"] == 10
assert response["data"]["locals"]["x"]["line"] == 5

# Efficiency (no junk)
assert len(response["data"]) <= 3  # No bloated payloads
assert len(response["summary"]) <= 200  # Concise summaries

4. VS Launch: Critical for Agent Workflows

Why critical:

  • Primary entry point for agents using framework debugging
  • Enables use of existing workspace launch configs
  • Complex variable substitution must work (${workspaceFolder}, etc.)

Test thoroughly:

  • Core launch.json parsing (language-independent)
  • Per-language config translation (Python/JavaScript/Java)
  • Variable substitution and resolution
  • Framework-specific launch configs

Critical Breakpoint Timing:

Set breakpoints when STARTING sessions (not after) to avoid race conditions with fast-executing programs:

# ✅ CORRECT
await debug_interface.start_session(program=prog, breakpoints=[{"line": 10}])

# ❌ WRONG: Race condition
await debug_interface.start_session(program=prog)
await debug_interface.set_breakpoint(file=prog, line=10)  # May be too late!

Exception: Long-running processes (servers) where you attach. Reference: src/tests/aidb_shared/e2e/test_complex_workflows.py

DebugInterface Abstraction

The cornerstone of our test strategy is the DebugInterface abstraction - a unified API that works with both MCP tools and the direct API.

For implementation details, see:

  • DebugInterface - Skill resource file
  • src/tests/_interfaces/ - Debug interface source and docstrings

Why? One test validates both entry points.

Hypothetical Example (illustrates pattern, not a real test file):

from tests._helpers.parametrization import parametrize_interfaces

class TestBreakpoints(BaseE2ETest):
    @parametrize_interfaces  # Runs twice: MCP and API
    @pytest.mark.asyncio
    async def test_set_breakpoint(self, debug_interface, simple_program):
        """Test works with BOTH MCP and API."""
        await debug_interface.start_session(program=simple_program)

        bp = await debug_interface.set_breakpoint(
            file=simple_program,
            line=5
        )

        self.verify_bp.verify_breakpoint_verified(bp)
        await debug_interface.stop_session()

Key Points:

  • @parametrize_interfaces runs test with both MCP and API
  • Same test logic validates both entry points
  • No duplication, no drift
  • See DebugInterface for details

The Shared Suite: Testing Debug Fundamentals

The shared suite is AIDB's language-agnostic test foundation that validates core debugging capabilities across all supported languages using normalized, programmatically generated test programs.

Key Innovation: Semantic markers that map identical logic to language-specific line numbers.

Location: src/tests/aidb_shared/ (integration/ + e2e/)

What it tests:

  • Debug primitives (breakpoints, stepping, variables)
  • Control flow across all 3 languages (Python, JavaScript, Java)
  • Zero duplication: One test → 6 execution paths (2 interfaces × 3 languages)

For complete details, see DebugInterface.

When to Use the Shared Suite

Use shared suite when:

  • Testing core debug operations (breakpoint, step, inspect)
  • Validating adapter behavior across languages
  • Ensuring language parity (all adapters work identically)

Use framework tests when:

  • Testing framework-specific debugging (Django ORM, Express middleware)
  • Validating launch.json configurations
  • Testing real-world application patterns

Test Organization

src/tests/
├── aidb_shared/               # ⭐ Shared suite: language-agnostic debug fundamentals
│   ├── integration/          # Core debug operations (breakpoints, stepping, variables)
│   └── e2e/                  # Complex workflows, parallel sessions
├── aidb/                      # Core API tests - organized by component
│   ├── adapters/             # Adapter-specific tests
│   ├── api/                  # Public API tests
│   ├── audit/                # Audit logging tests
│   ├── dap/                  # DAP client tests
│   └── session/              # Session management tests
├── aidb_mcp/                  # MCP server tests - organized by component
├── frameworks/                # Framework integration tests
│   ├── python/               # Flask, FastAPI, pytest
│   ├── javascript/           # Express, Jest
│   └── java/                 # Spring Boot, JUnit
├── _helpers/                  # Test helpers and utilities
├── _fixtures/                 # Shared fixtures
│   └── unit/                 # ⭐ Unit test infrastructure (see below)
└── _assets/                   # Test programs and data
    ├── framework_apps/       # Framework test applications
    └── test_programs/        # Generated programs for shared suite

Unit Test Infrastructure

The centralized unit test infrastructure at src/tests/_fixtures/unit/ provides reusable mocks, builders, and fixtures:

_fixtures/unit/
├── builders/           # DAPRequestBuilder, DAPResponseBuilder, DAPEventBuilder
├── dap/               # Transport, events, receiver mocks
├── session/           # Registry, lifecycle, state, child_manager mocks
├── adapter/           # Port, process, launch_orchestrator mocks
├── mcp/               # DebugService, MCPSessionContext mocks
├── conftest.py        # Master fixture re-exports
├── context.py         # mock_ctx, null_ctx, tmp_storage
└── assertions.py      # UnitAssertions class

Usage Pattern:

# In domain conftest.py (e.g., src/tests/aidb/dap/unit/conftest.py)
from tests._fixtures.unit.conftest import *  # noqa: F401, F403
from tests._fixtures.unit.builders import DAPEventBuilder, DAPResponseBuilder

# In test file
def test_something(mock_ctx, mock_transport):
    event = DAPEventBuilder.stopped_event(reason="breakpoint")
    # ...

Key Components:

  • Builders - Fluent API for DAP protocol objects (requests, responses, events)
  • mock_ctx - Standard logging context mock with debug/info/warning/error methods
  • UnitAssertions - DAP-specific assertion helpers

Test Execution Modes

Test suites run in different environments based on their requirements:

Local-Only Suites (no Docker):

  • cli - CLI command tests
  • mcp - MCP server unit/integration tests
  • core - Core AIDB API tests
  • common - Common utilities tests
  • logging - Logging framework tests
  • ci_cd - CI/CD workflow tests

Docker Suites (require containers):

  • shared - Multi-language shared tests (parallel language containers)
  • frameworks - Framework integration tests (parallel language containers)
  • launch - Launch config tests (parallel language containers)

Why the split?

  • Local suites test Python-only logic (handlers, validation, utils)
  • Docker suites test multi-language scenarios
  • Multi-language MCP functionality tested in shared/frameworks/launch

Running tests:

./dev-cli test run -s mcp      # Local execution
./dev-cli test run -s shared   # Docker execution

Code Reuse: Don't Reinvent

Always use existing infrastructure:

  • Test Base Classes - BaseE2ETest, BaseIntegrationTest, FrameworkDebugTestBase
  • Parametrization Decorators - @parametrize_interfaces, @parametrize_languages
  • Helper Assertions - self.verify_bp, self.verify_exec, MCPAssertions
  • Constants - StopReason, TestTimeouts, MCPTool

For complete details, see E2E Patterns.

Working Examples

Study these real tests before writing new ones:

Framework Tests (E2E)

  • Python: test_flask_debugging.py, test_fastapi_debugging.py, test_pytest_debugging.py
  • JavaScript: test_express_debugging.py, test_jest_debugging.py
  • Java: test_springboot_debugging.py, test_junit_debugging.py

Core API Tests

  • Launch Variable Resolution: test_launch_variable_resolution.py
  • Session Target Handling: test_session_target_handling.py

For complete file paths and patterns, see E2E Patterns.

Common Patterns

For hypothetical examples illustrating common patterns, see E2E Patterns.

Key patterns covered:

  1. Basic E2E Test
  2. Breakpoint Test with Markers
  3. Dual-Launch Equivalence Test
  4. MCP Response Validation

When Creating New Tests

Step 1: Choose Test Type

  • E2E? Full workflow, real program, complete integration
  • Integration? Component interactions, lifecycle management
  • Unit? Specific function, edge case, error handling

Step 2: Find Similar Test

Look at E2E Patterns:

  • Django/Express for framework tests
  • Existing tests in the same directory
  • Similar test scenarios in other languages

Step 3: Copy Pattern, Adapt

Don't start from scratch:

  1. Copy a working test
  2. Adapt to your scenario
  3. Use same helpers and assertions
  4. Follow same structure

Step 4: Use Existing Infrastructure

Don't create:

  • New assertion helpers (use existing)
  • New fixtures (check conftest.py files first)
  • New constants (use constants.py)
  • New base classes (inherit from existing)

Do create:

  • Tests using existing patterns
  • Scenario-specific test data
  • Framework-specific fixtures (if needed)

Performance Testing

Current State: No performance baselines exist yet

Phase 1: Establish baselines

  • Analyze existing metrics/instrumentation
  • Determine "healthy" latencies
  • Document target times

Phase 2: Regression testing

  • Monitor key operations (breakpoint set, variable inspect, step)
  • Alert on degradation

For now: Focus on functional correctness, not performance.

Success Criteria

Test Quality Checklist

  • Test uses @parametrize_interfaces for MCP/API coverage
  • Test inherits from appropriate base class
  • Test uses helper assertions, not custom assertions
  • Test validates content accuracy, not just structure
  • MCP tests check efficiency (no bloated payloads)
  • Test has clear docstring explaining what it validates
  • Test follows working examples
  • Test is in correct directory (e2e/integration/unit)

Framework Test Checklist

  • Inherits from FrameworkDebugTestBase
  • Implements test_launch_via_api()
  • Implements test_launch_via_vscode_config()
  • Implements test_dual_launch_equivalence()
  • Sets framework_name attribute
  • Uses simple smoke tests (no deep framework testing)

Investigating Test Failures

CRITICAL: When tests fail, check logs BEFORE attempting fixes.

See: Debugging Failures for log locations, investigation workflow, and common patterns.

For CI test failures, use the ci-cd-workflows skill's troubleshooting guide.

Resources

Resource Content
E2E Patterns Test patterns, markers, code reuse, working examples
Framework Tests Dual-launch pattern, Flask/Express examples
DebugInterface Unified API abstraction, shared suite architecture
Debugging Failures Log locations, investigation workflow, common issues

Test Infrastructure: src/tests/ (see _interfaces/, _fixtures/, _helpers/ for core components)

Getting Started

  1. Read CONTEXT.md: wip/test-implementation-backlog/CONTEXT.md
  2. Study working examples: Flask (test_flask_debugging.py) and Express (test_express_debugging.py)
  3. Choose a test to implement: Start with E2E (highest ROI)
  4. Copy a working test: Don't start from scratch
  5. Adapt to your scenario: Use same patterns, different data
  6. Validate: Run test, ensure it passes with both MCP and API

Questions?

Internal Documentation:

  • src/tests/ - Test infrastructure (see _interfaces/, _fixtures/, _helpers/)
  • docs/developer-guide/overview.md - System architecture

Code References:

  • DAP Protocol: See src/aidb/dap/protocol/ (fully typed, types.py + requests.py + responses.py + events.py)
  • Test Infrastructure: See src/tests/_helpers/ and src/tests/_fixtures/
  • Working Examples: See Flask/Express framework tests

Remember:

  • E2E first, validate content accuracy
  • Use shared suite for debug fundamentals, framework tests for integration
  • Keep framework tests simple (no framework internals)
  • Always use the DebugInterface abstraction (zero duplication)