Claude Code Plugins

Community-maintained marketplace

Feedback

testing-strategy-builder

@yonatangross/create-yg-app
1
0

Use this skill when creating comprehensive testing strategies for applications. Provides test planning templates, coverage targets, test case structures, and guidance for unit, integration, E2E, and performance testing. Ensures robust quality assurance across the development lifecycle.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name testing-strategy-builder
description Use this skill when creating comprehensive testing strategies for applications. Provides test planning templates, coverage targets, test case structures, and guidance for unit, integration, E2E, and performance testing. Ensures robust quality assurance across the development lifecycle.
version 1.0.0
author AI Agent Hub
tags testing, quality-assurance, test-strategy, automation, coverage

Testing Strategy Builder

Overview

This skill provides comprehensive guidance for building effective testing strategies that ensure software quality, reliability, and maintainability. Whether starting from scratch or improving existing test coverage, this framework helps teams design robust testing approaches.

When to use this skill:

  • Planning testing strategy for new projects or features
  • Improving test coverage in existing codebases
  • Establishing quality gates and coverage targets
  • Designing test automation architecture
  • Creating test plans and test cases
  • Choosing appropriate testing tools and frameworks
  • Implementing continuous testing in CI/CD pipelines

Bundled Resources:

  • references/code-examples.md - Detailed testing code examples
  • templates/test-plan-template.md - Comprehensive test plan template
  • templates/test-case-template.md - Test case documentation template
  • checklists/test-coverage-checklist.md - Coverage verification checklist

Required Tools

This skill references the following testing tools. Not all are required - the skill will recommend appropriate tools based on your project.

JavaScript/TypeScript Testing

  • Jest: Most popular testing framework

    • Install: npm install --save-dev jest @types/jest
    • Config: npx jest --init
  • Vitest: Vite-native testing framework

    • Install: npm install --save-dev vitest
    • Config: Add to vite.config.ts
  • MSW (Mock Service Worker): Network-level API mocking (2025 STANDARD)

    • Install: npm install --save-dev msw
    • Setup: npx msw init public/ --save
    • Why MSW: Intercepts at network level, not implementation level
  • Playwright: End-to-end testing

    • Install: npm install --save-dev @playwright/test
    • Setup: npx playwright install
  • k6: Performance testing

    • Install (macOS): brew install k6
    • Install (Linux): Download from k6.io
    • Command: k6 run script.js

Python Testing

  • pytest: Standard Python testing framework

    • Install: pip install pytest
    • Command: pytest
  • pytest-cov: Coverage reporting

    • Install: pip install pytest-cov
    • Command: pytest --cov=.
  • pytest-vcr / VCR.py: HTTP recording/playback (2025 STANDARD)

    • Install: pip install pytest-vcr vcrpy
    • Config: Add to conftest.py
    • Why VCR: Record real HTTP responses once, replay deterministically
  • Locust: Performance testing

    • Install: pip install locust
    • Command: locust -f locustfile.py

Coverage Tools

  • c8: JavaScript/TypeScript coverage

    • Install: npm install --save-dev c8
    • Command: c8 npm test
  • Istanbul/nyc: Alternative JS coverage

    • Install: npm install --save-dev nyc
    • Command: nyc npm test

Installation Verification

# JavaScript/TypeScript
jest --version
vitest --version
playwright --version
k6 version

# Python
pytest --version
locust --version

# Coverage
c8 --version
nyc --version

Note: The skill will guide you to select tools based on your project framework (React, Vue, FastAPI, Django, etc.) and testing needs.

Testing Philosophy

The Testing Trophy 🏆

Modern testing follows the "Testing Trophy" model (evolved from the testing pyramid):

         🏆
       /    \
      /  E2E  \         ← Few (critical user journeys)
     /----------\
    / Integration\      ← Many (component interactions)
   /--------------\
  /     Unit       \    ← Most (business logic)
 /------------------\
/  Static Analysis   \  ← Foundation (linting, type checking)

Principles:

  1. Static Analysis: Catch syntax errors, type issues, and common bugs before runtime
  2. Unit Tests: Test individual functions and components in isolation
  3. Integration Tests: Test how components work together
  4. E2E Tests: Validate critical user workflows end-to-end

Balance: 70% integration, 20% unit, 10% E2E (adjust based on context)


Testing Strategy Framework

1. Coverage Targets

Recommended Targets:

  • Overall Code Coverage: 80% minimum
  • Critical Paths: 95-100% (payment, auth, data mutations)
  • New Features: 100% coverage requirement
  • Business Logic: 90%+ coverage
  • UI Components: 70%+ coverage

Coverage Types:

  • Line Coverage: Percentage of code lines executed
  • Branch Coverage: Percentage of decision branches taken
  • Function Coverage: Percentage of functions called
  • Statement Coverage: Percentage of statements executed

Important: Coverage is a metric, not a goal. 100% coverage ≠ bug-free code.

2. Test Classification

Static Analysis

Purpose: Catch errors before runtime Tools: ESLint, Prettier, TypeScript, Pylint, mypy, Ruff When to run: Pre-commit hooks, CI pipeline

Unit Tests

Purpose: Test isolated business logic Tools: Jest, Vitest, pytest, JUnit Characteristics:

  • Fast execution (< 100ms per test)
  • No external dependencies (database, API, filesystem)
  • Deterministic (same input = same output)
  • Test single responsibility

Coverage Target: 90%+ for business logic

See references/code-examples.md for detailed unit test examples.

Integration Tests

Purpose: Test component interactions Tools: Testing Library, Supertest, pytest with fixtures Characteristics:

  • Test multiple units working together
  • May use test databases or mocked external services
  • Moderate execution time (< 1s per test)
  • Focus on interfaces and contracts

Coverage Target: 70%+ for API endpoints and component interactions

See references/code-examples.md for API integration test examples.

End-to-End (E2E) Tests

Purpose: Validate critical user journeys Tools: Playwright, Cypress, Selenium Characteristics:

  • Test entire application flow (frontend + backend + database)
  • Slow execution (5-30s per test)
  • Run against production-like environment
  • Focus on business-critical paths

Coverage Target: 5-10 critical user journeys

See references/code-examples.md for complete E2E test examples.

Performance Tests

Purpose: Validate system performance under load Tools: k6, Artillery, JMeter, Locust Types:

  • Load Testing: System behavior under expected load
  • Stress Testing: Breaking point identification
  • Spike Testing: Sudden traffic surge handling
  • Soak Testing: Sustained load over time (memory leaks)

Coverage Target: Test all performance-critical endpoints

See references/code-examples.md for k6 load test examples.


Test Planning

1. Risk-Based Testing

Prioritize testing based on risk assessment:

High Risk (100% coverage required):

  • Payment processing
  • Authentication and authorization
  • Data mutations (create, update, delete)
  • Security-critical operations
  • Compliance-related features

Medium Risk (80% coverage):

  • Business logic
  • Data transformations
  • API integrations
  • Email/notification systems

Low Risk (50% coverage):

  • UI styling
  • Static content
  • Read-only operations
  • Non-critical features

2. Test Case Design

Given-When-Then Pattern:

Given [initial context]
When [action occurs]
Then [expected outcome]

This pattern keeps tests clear and focused. See references/code-examples.md for implementation examples.

3. Test Data Management

Strategies:

  • Fixtures: Pre-defined test data in JSON/YAML files
  • Factories: Generate test data programmatically
  • Seeders: Populate test database with known data
  • Faker Libraries: Generate realistic random data

See references/code-examples.md for test factory and fixture examples.


Testing Patterns and Best Practices

1. AAA Pattern (Arrange-Act-Assert)

Structure tests in three clear phases:

  • Arrange: Set up test data and context
  • Act: Perform the action being tested
  • Assert: Verify expected outcomes

See references/code-examples.md for detailed AAA pattern examples.

2. Test Isolation

Each test should be independent:

  • Use fresh test database for each test
  • Clean up resources after each test
  • Tests don't depend on execution order

See references/code-examples.md for test isolation patterns.

3. Mocking vs Real Dependencies

When to Mock:

  • External APIs (payment gateways, third-party services)
  • Slow operations (file I/O, network calls)
  • Non-deterministic behavior (current time, random values)
  • Hard-to-test scenarios (error conditions, edge cases)

When to Use Real Dependencies:

  • Fast, deterministic operations
  • Critical business logic
  • Database operations (use test database)
  • Internal service interactions

See references/code-examples.md for mocking examples.

4. MSW (Mock Service Worker) - 2025 Standard

MSW is the industry-standard approach for API mocking in frontend tests (Dec 2025).

MSW intercepts requests at the network level, not by mocking implementation details. This provides several advantages:

  • Tests use the real fetch/axios code - no implementation mocking
  • Handlers work across all test types (unit, integration, E2E)
  • Easy to simulate error states, delays, and edge cases
  • Same handlers work in browser and Node.js environments

Basic MSW Setup (Vitest/Jest)

// src/mocks/handlers.ts
import { http, HttpResponse } from 'msw'

export const handlers = [
  // Success response
  http.get('/api/v1/analyze/:id', ({ params }) => {
    return HttpResponse.json({
      id: params.id,
      status: 'completed',
      createdAt: '2025-12-25T00:00:00Z',
    })
  }),

  // Error response
  http.get('/api/v1/analyze/error', () => {
    return HttpResponse.json(
      { error: 'Analysis not found' },
      { status: 404 }
    )
  }),

  // Delayed response (simulates slow network)
  http.get('/api/v1/slow', async () => {
    await delay(2000) // 2 second delay
    return HttpResponse.json({ data: 'slow response' })
  }),
]
// src/mocks/server.ts (for Vitest/Jest - Node.js)
import { setupServer } from 'msw/node'
import { handlers } from './handlers'

export const server = setupServer(...handlers)
// src/mocks/browser.ts (for Storybook/browser tests)
import { setupWorker } from 'msw/browser'
import { handlers } from './handlers'

export const worker = setupWorker(...handlers)

Test Setup with MSW

// vitest.setup.ts
import { beforeAll, afterEach, afterAll } from 'vitest'
import { server } from './src/mocks/server'

// Start server before all tests
beforeAll(() => server.listen({ onUnhandledRequest: 'error' }))

// Reset handlers after each test (removes runtime overrides)
afterEach(() => server.resetHandlers())

// Close server after all tests
afterAll(() => server.close())

Runtime Handler Overrides

import { http, HttpResponse } from 'msw'
import { server } from '../mocks/server'

test('shows error message when API fails', async () => {
  // Override for this specific test
  server.use(
    http.get('/api/v1/analyze/:id', () => {
      return HttpResponse.json(
        { error: 'Server error' },
        { status: 500 }
      )
    })
  )

  render(<AnalysisView id="123" />)

  expect(await screen.findByText('Server error')).toBeInTheDocument()
})

test('shows loading state while fetching', async () => {
  // Delay response to test loading state
  server.use(
    http.get('/api/v1/analyze/:id', async () => {
      await delay(100)
      return HttpResponse.json({ id: '123', status: 'pending' })
    })
  )

  render(<AnalysisView id="123" />)

  // Loading skeleton should be visible
  expect(screen.getByTestId('skeleton')).toBeInTheDocument()

  // Then data appears
  expect(await screen.findByText('pending')).toBeInTheDocument()
})

MSW with Zod Validation Testing

import { z } from 'zod'
import { http, HttpResponse } from 'msw'
import { server } from '../mocks/server'

const AnalysisSchema = z.object({
  id: z.string().uuid(),
  status: z.enum(['pending', 'running', 'completed', 'failed']),
})

test('handles invalid API response gracefully', async () => {
  // Return malformed data
  server.use(
    http.get('/api/v1/analyze/:id', () => {
      return HttpResponse.json({
        id: 'not-a-uuid',  // Invalid!
        status: 'unknown', // Invalid enum!
      })
    })
  )

  render(<AnalysisView id="123" />)

  // Should show validation error, not crash
  expect(await screen.findByText(/validation error/i)).toBeInTheDocument()
})

MSW Anti-Patterns

// ❌ NEVER mock fetch/axios directly
jest.spyOn(global, 'fetch').mockResolvedValue(...)  // BAD!
jest.mock('axios')  // BAD!

// ❌ NEVER mock your API service module
jest.mock('../services/api')  // BAD!

// ❌ NEVER test implementation details
expect(fetch).toHaveBeenCalledWith('/api/...')  // BAD!

// ✅ ALWAYS use MSW handlers
import { http, HttpResponse } from 'msw'
server.use(http.get('/api/...', () => HttpResponse.json({...})))

// ✅ ALWAYS test user-visible behavior
expect(await screen.findByText('Success')).toBeInTheDocument()

MSW Integration Testing Example

import { render, screen, waitFor } from '@testing-library/react'
import userEvent from '@testing-library/user-event'
import { http, HttpResponse } from 'msw'
import { server } from '../mocks/server'
import { QueryClient, QueryClientProvider } from '@tanstack/react-query'

function renderWithProviders(component: React.ReactNode) {
  const queryClient = new QueryClient({
    defaultOptions: {
      queries: { retry: false },
    },
  })
  return render(
    <QueryClientProvider client={queryClient}>
      {component}
    </QueryClientProvider>
  )
}

describe('AnalysisForm', () => {
  test('submits analysis and shows result', async () => {
    const user = userEvent.setup()

    // Mock the POST endpoint
    server.use(
      http.post('/api/v1/analyze', async ({ request }) => {
        const body = await request.json()
        return HttpResponse.json({
          analysis_id: 'new-123',
          url: body.url,
          status: 'pending',
        })
      })
    )

    renderWithProviders(<AnalysisForm />)

    // Fill form
    await user.type(screen.getByLabelText('URL'), 'https://example.com')
    await user.click(screen.getByRole('button', { name: /analyze/i }))

    // Verify result
    expect(await screen.findByText(/analysis started/i)).toBeInTheDocument()
    expect(screen.getByText('new-123')).toBeInTheDocument()
  })

  test('shows validation error on invalid URL', async () => {
    const user = userEvent.setup()

    server.use(
      http.post('/api/v1/analyze', () => {
        return HttpResponse.json(
          { detail: 'Invalid URL format' },
          { status: 422 }
        )
      })
    )

    renderWithProviders(<AnalysisForm />)

    await user.type(screen.getByLabelText('URL'), 'not-a-url')
    await user.click(screen.getByRole('button', { name: /analyze/i }))

    expect(await screen.findByText(/invalid url/i)).toBeInTheDocument()
  })
})

### 4. VCR.py - Python HTTP Recording (2025 Standard)

**VCR.py is the gold standard for testing Python code that makes HTTP requests.** It records real HTTP interactions once, then replays them for deterministic tests.

#### Why VCR.py?

| Approach | Problem |
|----------|---------|
| Mocking `requests` | Couples tests to implementation details |
| Live HTTP calls | Slow, flaky, rate-limited, non-deterministic |
| Manual fixtures | Tedious to maintain, drift from reality |
| **VCR.py** | ✅ Records real responses, replays deterministically |

#### Basic Setup

```python
# conftest.py
import pytest
import vcr

# Configure VCR globally
@pytest.fixture(scope="module")
def vcr_config():
    return {
        "cassette_library_dir": "tests/cassettes",
        "record_mode": "once",  # Record once, then replay
        "match_on": ["uri", "method"],
        "filter_headers": ["authorization", "x-api-key"],  # Security!
        "filter_query_parameters": ["api_key", "token"],
    }

# Alternative: pytest-vcr fixture decorator
@pytest.fixture
def vcr_cassette_dir(request):
    return f"tests/cassettes/{request.module.__name__}"

Basic Usage

import pytest
import vcr

# Method 1: Context manager
def test_fetch_user_data():
    with vcr.use_cassette("tests/cassettes/user_data.yaml"):
        response = requests.get("https://api.example.com/users/1")
        assert response.status_code == 200
        assert response.json()["name"] == "John Doe"

# Method 2: pytest-vcr decorator (recommended)
@pytest.mark.vcr()
def test_fetch_user_data_decorator():
    response = requests.get("https://api.example.com/users/1")
    assert response.status_code == 200
    assert response.json()["name"] == "John Doe"

# Method 3: Custom cassette name
@pytest.mark.vcr("custom_cassette_name.yaml")
def test_with_custom_cassette():
    response = requests.get("https://api.example.com/users/1")
    assert response.status_code == 200

Async Support (httpx, aiohttp)

import pytest
import vcr
from httpx import AsyncClient

# VCR.py works with async HTTP clients
@pytest.mark.asyncio
@pytest.mark.vcr()
async def test_async_api_call():
    async with AsyncClient() as client:
        response = await client.get("https://api.example.com/data")
        assert response.status_code == 200
        assert "items" in response.json()

Recording Modes

# conftest.py - configure per environment
@pytest.fixture(scope="module")
def vcr_config():
    import os

    # CI: never record, only replay
    if os.environ.get("CI"):
        record_mode = "none"
    # Dev: record new, keep existing
    else:
        record_mode = "new_episodes"

    return {
        "record_mode": record_mode,
        "cassette_library_dir": "tests/cassettes",
    }
Mode Behavior Use Case
once Record if cassette missing, then replay Default for most tests
new_episodes Record new requests, replay existing Adding to existing tests
none Never record, fail on new requests CI environments
all Always record (overwrites) Refreshing stale cassettes

Filtering Sensitive Data

# conftest.py
@pytest.fixture(scope="module")
def vcr_config():
    return {
        # Remove headers before recording
        "filter_headers": [
            "authorization",
            "x-api-key",
            "cookie",
            "set-cookie",
        ],
        # Remove query parameters
        "filter_query_parameters": [
            "api_key",
            "access_token",
            "client_secret",
        ],
        # Custom body filter
        "before_record_request": filter_request_body,
        "before_record_response": filter_response_body,
    }

def filter_request_body(request):
    """Redact sensitive data from request body."""
    if request.body:
        import json
        try:
            body = json.loads(request.body)
            if "password" in body:
                body["password"] = "REDACTED"
            if "api_key" in body:
                body["api_key"] = "REDACTED"
            request.body = json.dumps(body)
        except json.JSONDecodeError:
            pass
    return request

def filter_response_body(response):
    """Redact sensitive data from response body."""
    # Similar filtering logic
    return response

Real-World Example: External API Service

# tests/services/test_tavily_service.py
import pytest
from app.services.external.tavily_service import TavilySearchService

@pytest.fixture
def tavily_service():
    return TavilySearchService(api_key="test-key")

@pytest.mark.vcr()
async def test_tavily_search_returns_results(tavily_service):
    """Test Tavily search with recorded HTTP response."""
    results = await tavily_service.search("Python async patterns")

    assert len(results) > 0
    assert all("url" in r for r in results)
    assert all("content" in r for r in results)

@pytest.mark.vcr()
async def test_tavily_search_handles_empty_query(tavily_service):
    """Test graceful handling of empty search."""
    results = await tavily_service.search("")

    assert results == []

@pytest.mark.vcr()
async def test_tavily_rate_limit_error(tavily_service):
    """Test handling of rate limit response (cassette has 429)."""
    with pytest.raises(RateLimitError):
        await tavily_service.search("query that triggers rate limit")

Cassette File Example

# tests/cassettes/test_tavily_search_returns_results.yaml
interactions:
- request:
    body: '{"query": "Python async patterns", "max_results": 10}'
    headers:
      Content-Type: application/json
      # Note: authorization header filtered out
    method: POST
    uri: https://api.tavily.com/search
  response:
    body:
      string: '{"results": [{"url": "https://...", "content": "..."}]}'
    headers:
      Content-Type: application/json
    status:
      code: 200
      message: OK
version: 1

VCR.py + LLM API Testing

# tests/services/test_llm_service.py
import pytest
import vcr

# Custom matcher for LLM requests (ignore timestamp, request_id)
def llm_request_matcher(r1, r2):
    """Match LLM requests ignoring dynamic fields."""
    import json

    if r1.uri != r2.uri or r1.method != r2.method:
        return False

    body1 = json.loads(r1.body)
    body2 = json.loads(r2.body)

    # Ignore fields that change between runs
    for field in ["request_id", "timestamp", "stream_id"]:
        body1.pop(field, None)
        body2.pop(field, None)

    return body1 == body2

@pytest.fixture(scope="module")
def vcr_config():
    return {
        "cassette_library_dir": "tests/cassettes/llm",
        "match_on": ["method", "uri"],
        "custom_matchers": [llm_request_matcher],
        "filter_headers": ["authorization", "x-api-key"],
    }

@pytest.mark.vcr()
async def test_llm_completion():
    """Test LLM completion with recorded response."""
    response = await llm_client.complete(
        model="claude-3-5-sonnet",
        messages=[{"role": "user", "content": "Say hello"}]
    )

    assert response.content is not None
    assert "hello" in response.content.lower()

VCR.py Anti-Patterns

# ❌ NEVER: Commit cassettes with real API keys
# Bad cassette file:
# headers:
#   authorization: Bearer sk-real-api-key-12345

# ❌ NEVER: Use "all" mode in CI
# record_mode: "all"  # Will try to make real HTTP calls!

# ❌ NEVER: Skip VCR for "simple" HTTP tests
def test_api_call():
    # This will make REAL HTTP calls in tests!
    response = requests.get("https://api.example.com/data")

# ✅ ALWAYS: Filter sensitive data
# ✅ ALWAYS: Use "none" mode in CI
# ✅ ALWAYS: Wrap all HTTP tests with VCR

Refreshing Stale Cassettes

# Delete old cassette to re-record
rm tests/cassettes/test_tavily_search_returns_results.yaml

# Run test to record fresh response
pytest tests/services/test_tavily_service.py::test_tavily_search_returns_results -v

# Or use environment variable to force re-record
VCR_RECORD_MODE=all pytest tests/services/ -v

5. Snapshot Testing

Use for: UI components, API responses, generated code

Warning: Snapshots can become brittle. Use for stable components, not rapidly changing UI.

6. Parameterized Tests

Test multiple scenarios with same logic using data tables.

See references/code-examples.md for parameterized test patterns.


Continuous Testing

1. CI/CD Integration

Pipeline Stages:

# Example: GitHub Actions
name: Test Pipeline

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Install dependencies
        run: npm ci
      - name: Lint
        run: npm run lint
      - name: Type check
        run: npm run typecheck
      - name: Unit & Integration Tests
        run: npm test -- --coverage
      - name: Upload coverage
        uses: codecov/codecov-action@v3
      - name: E2E Tests
        run: npm run test:e2e
      - name: Performance Tests (on main branch)
        if: github.ref == 'refs/heads/main'
        run: npm run test:performance

2. Quality Gates

Block merges/deployments if:

  • Code coverage drops below threshold (e.g., 80%)
  • Any tests fail
  • Linting errors exist
  • Performance regression detected (> 10% slower)
  • Security vulnerabilities found

3. Test Execution Strategy

On Every Commit:

  • Static analysis (lint, type check)
  • Unit tests
  • Fast integration tests (< 5 min total)

On Pull Request:

  • All tests (unit + integration + E2E)
  • Coverage report
  • Performance benchmarks

On Deploy to Staging:

  • Full E2E suite
  • Load testing
  • Security scans

On Deploy to Production:

  • Smoke tests (critical paths only)
  • Health checks
  • Canary deployments with monitoring

Testing Tools Recommendations

JavaScript/TypeScript

Category Tool Use Case
Unit/Integration Vitest Fast, Vite-native, modern
Unit/Integration Jest Mature, extensive ecosystem
E2E Playwright Cross-browser, reliable, fast
E2E Cypress Developer-friendly, visual debugging
Component Testing Testing Library User-centric, framework-agnostic
API Testing Supertest HTTP assertions, Express integration
Performance k6 Load testing, scriptable

Python

Category Tool Use Case
Unit/Integration pytest Powerful, extensible, fixtures
API Testing httpx + pytest Async support, modern
E2E Playwright (Python) Browser automation
Performance Locust Load testing, Python-based
Mocking unittest.mock Standard library, reliable

Common Testing Anti-Patterns

Testing Implementation Details

// Bad: Testing internal state
expect(component.state.isLoading).toBe(false);

// Good: Testing user-visible behavior
expect(screen.queryByText('Loading...')).not.toBeInTheDocument();

Tests Too Coupled to Code

// Bad: Test breaks when implementation changes
expect(userService.save).toHaveBeenCalledTimes(1);

// Good: Test behavior, not implementation
const user = await db.users.findOne({ email: 'test@example.com' });
expect(user).toBeTruthy();

Direct Fetch Mocking (2025 Anti-Pattern)

// Bad: Mocks implementation, not network behavior
jest.spyOn(global, 'fetch').mockResolvedValue({
  json: () => Promise.resolve({ data: 'mocked' })
});

jest.mock('axios');
jest.mock('../services/api');

// Good: Use MSW for network-level mocking
import { http, HttpResponse } from 'msw';
server.use(
  http.get('/api/data', () => HttpResponse.json({ data: 'mocked' }))
);

Flaky Tests

// Bad: Non-deterministic timeout
await waitFor(() => {
  expect(screen.getByText('Success')).toBeInTheDocument();
}, { timeout: 1000 }); // Might fail on slow CI

// Good: Use explicit waits with longer timeout
await screen.findByText('Success', {}, { timeout: 5000 });

Giant Test Cases

// Bad: One test does too much
test('user workflow', async () => {
  // 100 lines testing signup, login, profile update, logout...
});

// Good: Focused tests
test('user can sign up', async () => { /* ... */ });
test('user can login', async () => { /* ... */ });
test('user can update profile', async () => { /* ... */ });

Integration with Agents

Code Quality Reviewer

  • Reviews test coverage reports
  • Suggests missing test cases
  • Validates test quality and structure
  • Ensures tests follow patterns from this skill

Backend System Architect

  • Uses test strategy templates when designing services
  • Ensures APIs are testable (dependency injection, clear interfaces)
  • Plans integration test architecture

Frontend UI Developer

  • Applies component testing patterns
  • Uses Testing Library best practices
  • Implements E2E tests for user flows

AI/ML Engineer

  • Adapts testing patterns for ML models (data validation, model performance tests)
  • Uses performance testing for inference endpoints

Quick Start Checklist

When starting a new project or feature:

  • Define coverage targets (overall, critical paths, new code)
  • Choose testing framework (Jest/Vitest, Playwright, etc.)
  • Set up test infrastructure (test database, fixtures, factories)
  • Create test plan (see templates/test-plan-template.md)
  • Implement static analysis (ESLint, TypeScript)
  • Write unit tests for business logic (80%+ coverage)
  • Write integration tests for API endpoints (70%+ coverage)
  • Write E2E tests for critical user journeys (5-10 flows)
  • Configure CI/CD pipeline with quality gates
  • Set up coverage reporting (Codecov, Coveralls)
  • Document testing conventions in project README

For detailed code examples: See references/code-examples.md


AI/LLM Testing Patterns (v1.1.0)

Testing AI applications requires specialized approaches due to their probabilistic nature.

Async Timeout Testing

import pytest
import asyncio

@pytest.mark.asyncio
async def test_operation_respects_timeout():
    """Test that async operations honor timeout limits."""
    async def slow_operation():
        await asyncio.sleep(10)  # Simulates slow LLM call
        return "result"

    with pytest.raises(asyncio.TimeoutError):
        async with asyncio.timeout(0.1):
            await slow_operation()

@pytest.mark.asyncio
async def test_graceful_degradation_on_timeout():
    """Test fail-open behavior when operation times out."""
    result = await safe_operation_with_fallback(timeout=0.1)
    assert result["status"] == "fallback"
    assert result["error"] == "Operation timed out"

LLM Mock Patterns

from unittest.mock import AsyncMock, patch

@pytest.fixture
def mock_llm_response():
    """Mock LLM to return predictable structured output."""
    mock = AsyncMock()
    mock.return_value = {
        "content": "Mocked response",
        "confidence": 0.85,
        "tokens_used": 150
    }
    return mock

@pytest.mark.asyncio
async def test_synthesis_with_mocked_llm(mock_llm_response):
    """Test synthesis logic without actual LLM calls."""
    with patch("app.core.model_factory.get_model", return_value=mock_llm_response):
        result = await synthesize_findings(sample_findings)

    assert result["executive_summary"] is not None
    assert mock_llm_response.call_count == 1

Pydantic v2 Model Testing

import pytest
from pydantic import ValidationError

def test_quiz_question_validates_correct_answer():
    """Test that correct_answer must be in options."""
    with pytest.raises(ValidationError) as exc_info:
        QuizQuestion(
            question="What is 2+2?",
            options=["3", "4", "5"],
            correct_answer="6",  # Not in options!
            explanation="Basic arithmetic"
        )

    assert "correct_answer" in str(exc_info.value)
    assert "must be one of" in str(exc_info.value)

def test_quiz_question_accepts_valid_answer():
    """Test that valid answers pass validation."""
    q = QuizQuestion(
        question="What is 2+2?",
        options=["3", "4", "5"],
        correct_answer="4",  # Valid!
        explanation="Basic arithmetic"
    )
    assert q.correct_answer == "4"

Template Rendering Tests

from jinja2 import Environment, FileSystemLoader

@pytest.fixture
def jinja_env():
    return Environment(loader=FileSystemLoader("templates/"))

def test_template_handles_empty_tldr(jinja_env):
    """Template renders without crashing when tldr is empty."""
    template = jinja_env.get_template("artifact.j2")
    result = template.render(aggregated_insights={"tldr": {}})
    assert "TL;DR" not in result  # Section skipped gracefully

def test_template_handles_missing_nested_field(jinja_env):
    """Template handles None in nested objects."""
    template = jinja_env.get_template("artifact.j2")
    result = template.render(aggregated_insights={
        "tldr": {"summary": None, "key_takeaways": []}
    })
    # Should not crash, should handle gracefully
    assert isinstance(result, str)

LLM-as-Judge Evaluator Testing

@pytest.mark.asyncio
async def test_quality_evaluator_returns_normalized_score():
    """Quality scores should be normalized 0.0-1.0."""
    evaluator = create_quality_evaluator("relevance")

    # Mock the LLM to return a score
    with patch_evaluator_llm(return_score=8):  # 8/10
        result = await evaluator.aevaluate_strings(
            input="Test input",
            prediction="Test output"
        )

    assert 0.0 <= result["score"] <= 1.0
    assert result["score"] == 0.8  # 8/10 normalized

@pytest.mark.asyncio
async def test_quality_gate_fails_below_threshold():
    """Quality gate should fail when avg score < threshold."""
    with patch_quality_scores({"relevance": 0.5, "depth": 0.4, "coherence": 0.5}):
        result = await quality_gate_node(sample_state)

    assert result["quality_gate_passed"] is False
    assert result["quality_gate_avg_score"] < 0.7

Edge Case Identification Strategy

When testing LLM integrations, always test these edge cases:

  • Empty inputs: What happens with empty strings or None?
  • Very long inputs: Does truncation work correctly?
  • Timeout scenarios: Does fail-open work?
  • Partial responses: What if LLM returns 90% complete?
  • Invalid structured output: What if schema validation fails?
  • Division by zero: What if averaging over empty list?
  • Nested null access: What if parent object exists but child is None?

Skill Version: 1.3.0 Last Updated: 2025-12-27 Maintained by: AI Agent Hub Team

Changelog

v1.3.0 (2025-12-27)

  • Added VCR.py (pytest-vcr) as 2025 standard for Python HTTP recording/playback
  • Added comprehensive VCR.py patterns section with async support
  • Added VCR.py + LLM API testing patterns
  • Added cassette filtering for sensitive data
  • Added recording modes documentation (once, new_episodes, none, all)
  • Added VCR.py anti-patterns section

v1.2.0 (2025-12-25)

  • Added MSW (Mock Service Worker) as 2025 standard for API mocking
  • Added comprehensive MSW patterns section with Vitest/Jest examples
  • Added MSW + Zod validation testing patterns
  • Added MSW anti-patterns section
  • Flagged direct fetch/axios mocking as anti-pattern
  • Added MSW integration testing example with TanStack Query

v1.1.0 (2025-12-14)

  • Added AI/LLM testing patterns (async timeout, mock LLM, Pydantic v2)