name	comprehensive-testing
description	Complete testing strategy covering TDD workflow, test pyramid, unit/integration/E2E/property testing, framework best practices (Jest, Vitest, pytest), mock strategies, and CI integration. Use when writing tests, reviewing test quality, or establishing testing standards.
allowed-tools	Read, Grep, Glob, Bash, Edit, Write

Comprehensive Testing

Based on Anthropic's Claude Code Best Practices and community patterns

Core Philosophy

"Claude performs best when it has a clear target to iterate against—a test case provides concrete success criteria."

Testing is not about proving code works; it's about designing code that is testable and documenting expected behavior.

Test Pyramid

        /\
       /  \        E2E Tests (10%)
      /----\       - Full user flows
     /      \      - Slowest, most brittle
    /--------\
   /          \    Integration Tests (20%)
  /------------\   - Component interaction
 /              \  - Real dependencies
/----------------\
       Unit Tests (70%)
       - Single function/method
       - Fast, isolated, many

Level	Speed	Scope	When to Use
Unit	<10ms	Single function	All logic
Integration	<1s	Multiple components	APIs, DB
E2E	<30s	Full flow	Critical paths

TDD Workflow (Anthropic Recommended)

The 6-Step Process

1. WRITE TESTS FIRST
   ↓
2. VERIFY TESTS FAIL
   ↓
3. COMMIT TEST SUITE
   ↓
4. IMPLEMENT CODE
   ↓
5. VERIFY WITH SUBAGENT
   ↓
6. COMMIT IMPLEMENTATION

Step 1: Write Tests First

Be EXPLICIT about TDD to avoid mock implementations:

"I want to implement [feature] using TDD.
First, write tests for [expected behavior] with these input/output pairs:
- Input: X → Expected: Y
- Input: A → Expected: B
Do NOT create any implementation yet."

Step 2: Verify Tests Fail

# Run tests and confirm they fail for the RIGHT reason
npm test  # or pytest, go test, etc.

# Expected: "function not found" or "undefined"
# NOT: syntax error, wrong import

Step 3: Commit Test Suite

git add tests/
git commit -m "test: Add tests for [feature] (RED phase)"

Step 4: Implement Incrementally

"Now implement the code to make these tests pass.
Do NOT modify the tests.
Run tests after each change until all pass."

Claude will enter an autonomous loop:

Write code → Run tests → Analyze failures → Adjust → Repeat

Step 5: Verify with Subagent

"Use a subagent to independently verify the implementation:
- Is it overfitting to tests?
- Are edge cases handled?
- Is the code maintainable?"

Step 6: Commit Implementation

git add src/
git commit -m "feat: Implement [feature] (GREEN phase)"

Test Structure Patterns

AAA Pattern (Arrange-Act-Assert)

describe('UserService', () => {
  it('should create user with valid email', async () => {
    // Arrange - Setup test data and dependencies
    const userRepo = new InMemoryUserRepository();
    const service = new UserService(userRepo);
    const input = { email: 'test@example.com', name: 'Test' };

    // Act - Execute the code under test
    const user = await service.create(input);

    // Assert - Verify the results
    expect(user.email).toBe('test@example.com');
    expect(user.id).toBeDefined();
    expect(await userRepo.findById(user.id)).toEqual(user);
  });
});

Given-When-Then Pattern

def test_order_total_with_discount():
    """
    Given an order with items totaling $100
    When a 20% discount is applied
    Then the total should be $80
    """
    # Given
    order = Order()
    order.add_item(Item(price=50))
    order.add_item(Item(price=50))

    # When
    order.apply_discount(Percentage(20))

    # Then
    assert order.total == Money(80)

Framework Best Practices

Jest / Vitest (JavaScript/TypeScript)

// Structure
describe('ModuleName', () => {
  describe('methodName', () => {
    it('should [expected behavior] when [condition]', () => {});
  });
});

// Setup/Teardown
beforeAll(async () => { /* one-time setup */ });
beforeEach(() => { /* per-test setup */ });
afterEach(() => { /* per-test cleanup */ });
afterAll(async () => { /* one-time cleanup */ });

// Async testing
it('handles async operations', async () => {
  const result = await asyncFunction();
  expect(result).toBe(expected);
});

// Error testing
it('throws on invalid input', () => {
  expect(() => validate(null)).toThrow('Input required');
});

// Snapshot testing (use sparingly)
it('renders correctly', () => {
  const tree = renderer.create(<Component />).toJSON();
  expect(tree).toMatchSnapshot();
});

// Table-driven tests
it.each([
  [1, 1, 2],
  [2, 2, 4],
  [0, 0, 0],
])('add(%i, %i) = %i', (a, b, expected) => {
  expect(add(a, b)).toBe(expected);
});

vitest.config.ts:

export default defineConfig({
  test: {
    globals: true,
    environment: 'node',
    coverage: {
      provider: 'v8',
      reporter: ['text', 'json', 'html'],
      thresholds: {
        lines: 80,
        branches: 70,
        functions: 80,
      },
    },
  },
});

pytest (Python)

import pytest
from mymodule import Calculator

# Fixtures for dependency injection
@pytest.fixture
def calculator():
    return Calculator()

@pytest.fixture
def database():
    db = TestDatabase()
    yield db
    db.cleanup()

# Parametrized tests
@pytest.mark.parametrize("a,b,expected", [
    (1, 1, 2),
    (2, 2, 4),
    (0, 0, 0),
    (-1, 1, 0),
])
def test_add(calculator, a, b, expected):
    assert calculator.add(a, b) == expected

# Exception testing
def test_divide_by_zero(calculator):
    with pytest.raises(ZeroDivisionError):
        calculator.divide(1, 0)

# Async testing
@pytest.mark.asyncio
async def test_async_operation():
    result = await async_function()
    assert result == expected

# Markers for categorization
@pytest.mark.slow
@pytest.mark.integration
def test_database_connection(database):
    assert database.is_connected()

conftest.py:

import pytest

@pytest.fixture(scope="session")
def database_url():
    return "postgresql://test:test@localhost/test"

@pytest.fixture(autouse=True)
def reset_database(database):
    yield
    database.rollback()

pytest.ini:

[pytest]
testpaths = tests
python_files = test_*.py
python_functions = test_*
addopts = -v --cov=src --cov-report=term-missing
markers =
    slow: marks tests as slow
    integration: marks tests as integration tests

Go Testing

package mypackage

import (
    "testing"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/require"
)

func TestAdd(t *testing.T) {
    tests := []struct {
        name     string
        a, b     int
        expected int
    }{
        {"positive numbers", 1, 2, 3},
        {"zero values", 0, 0, 0},
        {"negative numbers", -1, -2, -3},
    }

    for _, tt := range tests {
        t.Run(tt.name, func(t *testing.T) {
            result := Add(tt.a, tt.b)
            assert.Equal(t, tt.expected, result)
        })
    }
}

// Table-driven with subtests
func TestUserService_Create(t *testing.T) {
    t.Run("creates user with valid input", func(t *testing.T) {
        repo := NewInMemoryRepo()
        svc := NewUserService(repo)

        user, err := svc.Create(CreateUserInput{Email: "test@example.com"})

        require.NoError(t, err)
        assert.NotEmpty(t, user.ID)
        assert.Equal(t, "test@example.com", user.Email)
    })

    t.Run("returns error for invalid email", func(t *testing.T) {
        repo := NewInMemoryRepo()
        svc := NewUserService(repo)

        _, err := svc.Create(CreateUserInput{Email: "invalid"})

        require.Error(t, err)
        assert.Contains(t, err.Error(), "invalid email")
    })
}

Mock Strategy

When to Mock

Scenario	Mock?	Reason
External APIs	✅ Yes	Slow, unreliable, costs money
Time/Date	✅ Yes	Non-deterministic
Random	✅ Yes	Non-deterministic
Database (unit)	✅ Yes	Slow, complex setup
Database (integration)	❌ No	Test real behavior
Your own code	⚠️ Rarely	Prefer real implementations
File system	⚠️ Depends	Use temp dirs when possible

How to Mock

// Jest - Mock module
jest.mock('./emailService', () => ({
  sendEmail: jest.fn().mockResolvedValue({ success: true }),
}));

// Jest - Mock function
const mockCallback = jest.fn();
mockCallback.mockReturnValue(42);

// Vitest - Spy
import { vi } from 'vitest';
const spy = vi.spyOn(console, 'log');

// Time mocking
beforeEach(() => {
  vi.useFakeTimers();
  vi.setSystemTime(new Date('2024-01-01'));
});

afterEach(() => {
  vi.useRealTimers();
});

# pytest - Mock with unittest.mock
from unittest.mock import Mock, patch, MagicMock

@patch('mymodule.external_api.fetch')
def test_with_mocked_api(mock_fetch):
    mock_fetch.return_value = {'data': 'mocked'}
    result = my_function()
    assert result == expected
    mock_fetch.assert_called_once_with('expected_arg')

# Fixture-based mock
@pytest.fixture
def mock_email_service():
    service = Mock()
    service.send.return_value = True
    return service

Prefer Test Doubles Over Mocks

// ❌ Heavy mocking
const mockRepo = {
  findById: jest.fn().mockResolvedValue({ id: '1', name: 'Test' }),
  save: jest.fn().mockResolvedValue(undefined),
  delete: jest.fn().mockResolvedValue(undefined),
};

// ✅ In-memory implementation (test double)
class InMemoryUserRepository implements UserRepository {
  private users: Map<string, User> = new Map();

  async findById(id: string): Promise<User | null> {
    return this.users.get(id) || null;
  }

  async save(user: User): Promise<User> {
    this.users.set(user.id, user);
    return user;
  }

  async delete(id: string): Promise<void> {
    this.users.delete(id);
  }
}

Boundary Testing

Test These Boundaries

describe('boundary testing', () => {
  // Null/Undefined
  it('handles null input', () => {
    expect(() => process(null)).toThrow('Input required');
  });

  // Empty values
  it('handles empty string', () => {
    expect(validate('')).toBe(false);
  });

  it('handles empty array', () => {
    expect(sum([])).toBe(0);
  });

  // Zero values
  it('handles zero', () => {
    expect(divide(0, 5)).toBe(0);
  });

  it('handles divide by zero', () => {
    expect(() => divide(5, 0)).toThrow('Division by zero');
  });

  // Boundary values
  it('handles minimum value', () => {
    expect(clamp(Number.MIN_SAFE_INTEGER, 0, 100)).toBe(0);
  });

  it('handles maximum value', () => {
    expect(clamp(Number.MAX_SAFE_INTEGER, 0, 100)).toBe(100);
  });

  // Off-by-one
  it('includes boundary', () => {
    expect(isInRange(10, 0, 10)).toBe(true);  // inclusive
  });

  it('excludes just outside boundary', () => {
    expect(isInRange(11, 0, 10)).toBe(false);
  });

  // Type coercion
  it('handles string numbers', () => {
    expect(parsePositiveInt('42')).toBe(42);
    expect(() => parsePositiveInt('abc')).toThrow();
  });

  // Unicode/Special chars
  it('handles unicode', () => {
    expect(normalize('héllo')).toBe('hello');
  });

  it('handles emoji', () => {
    expect(charCount('👨‍👩‍👧‍👦')).toBe(1); // grapheme cluster
  });
});

Error Scenario Checklist

- [ ] Network failure
- [ ] Timeout
- [ ] Invalid input format
- [ ] Missing required fields
- [ ] Unauthorized access
- [ ] Resource not found
- [ ] Concurrent modification
- [ ] Disk full
- [ ] Out of memory
- [ ] Rate limit exceeded

Property-Based Testing

Concept

Instead of specific examples, define properties that must hold for all inputs.

import fc from 'fast-check';

// Property: reversing twice returns original
test('reverse is involutory', () => {
  fc.assert(
    fc.property(fc.array(fc.integer()), (arr) => {
      const reversed = reverse(reverse(arr));
      return deepEquals(arr, reversed);
    })
  );
});

// Property: sorted array is always ordered
test('sort produces ordered output', () => {
  fc.assert(
    fc.property(fc.array(fc.integer()), (arr) => {
      const sorted = sort(arr);
      for (let i = 1; i < sorted.length; i++) {
        if (sorted[i] < sorted[i - 1]) return false;
      }
      return true;
    })
  );
});

// Property: encoding then decoding returns original
test('JSON roundtrip', () => {
  fc.assert(
    fc.property(fc.jsonValue(), (value) => {
      const encoded = JSON.stringify(value);
      const decoded = JSON.parse(encoded);
      return deepEquals(value, decoded);
    })
  );
});

from hypothesis import given, strategies as st

@given(st.lists(st.integers()))
def test_sort_is_idempotent(xs):
    """Sorting twice gives same result as sorting once"""
    assert sorted(sorted(xs)) == sorted(xs)

@given(st.text())
def test_encode_decode_roundtrip(s):
    """Base64 encoding/decoding preserves data"""
    encoded = base64.b64encode(s.encode())
    decoded = base64.b64decode(encoded).decode()
    assert decoded == s

Async Testing

Patterns

// Wait for promise
it('resolves with data', async () => {
  const result = await fetchData();
  expect(result).toBeDefined();
});

// Wait for rejection
it('rejects with error', async () => {
  await expect(failingOperation()).rejects.toThrow('Error message');
});

// Timeout handling
it('times out after 5s', async () => {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 5000);

  try {
    await fetch(url, { signal: controller.signal });
  } finally {
    clearTimeout(timeout);
  }
}, 10000);

// Polling/Retry
it('eventually succeeds', async () => {
  await waitFor(async () => {
    const status = await getStatus();
    expect(status).toBe('ready');
  }, { timeout: 5000, interval: 100 });
});

// ❌ NEVER use fixed sleep
it('bad: uses sleep', async () => {
  await doSomething();
  await sleep(1000);  // NEVER DO THIS
  expect(result).toBe(expected);
});

// ✅ Use proper waiting
it('good: polls for condition', async () => {
  await doSomething();
  await waitFor(() => expect(getResult()).toBe(expected));
});

Test Data Management

Fixtures

// Shared test data
const fixtures = {
  validUser: {
    id: 'user-1',
    email: 'test@example.com',
    name: 'Test User',
  },
  adminUser: {
    id: 'admin-1',
    email: 'admin@example.com',
    name: 'Admin User',
    role: 'admin',
  },
};

Factories

// Factory function
function createUser(overrides: Partial<User> = {}): User {
  return {
    id: `user-${Math.random().toString(36).slice(2)}`,
    email: 'test@example.com',
    name: 'Test User',
    createdAt: new Date(),
    ...overrides,
  };
}

// Usage
const user = createUser({ email: 'custom@example.com' });

Builders

class UserBuilder {
  private user: Partial<User> = {};

  withEmail(email: string): this {
    this.user.email = email;
    return this;
  }

  withRole(role: Role): this {
    this.user.role = role;
    return this;
  }

  asAdmin(): this {
    return this.withRole('admin');
  }

  build(): User {
    return {
      id: this.user.id ?? generateId(),
      email: this.user.email ?? 'test@example.com',
      name: this.user.name ?? 'Test User',
      role: this.user.role ?? 'user',
      createdAt: new Date(),
    };
  }
}

// Usage
const admin = new UserBuilder().withEmail('admin@test.com').asAdmin().build();

Coverage Strategy

Targets

Metric	Minimum	Target	Critical Path
Line	70%	80%	100%
Branch	60%	70%	100%
Function	70%	80%	100%

What to Cover First

1. Business-critical logic
2. Complex algorithms
3. Error handling paths
4. Edge cases
5. Integration points

What NOT to Cover

- Third-party library code
- Generated code
- Configuration files
- Type definitions only
- Trivial getters/setters

CI Integration

GitHub Actions Example

name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: '20'
          cache: 'npm'

      - run: npm ci

      - name: Run tests
        run: npm test -- --coverage

      - name: Upload coverage
        uses: codecov/codecov-action@v4
        with:
          files: ./coverage/lcov.info
          fail_ci_if_error: true

      - name: Check coverage thresholds
        run: |
          npm test -- --coverage --coverageThreshold='{
            "global": {
              "lines": 80,
              "branches": 70
            }
          }'

Anti-Patterns to Avoid

❌ Testing implementation, not behavior
   - Don't verify internal method calls
   - Test observable outputs only

❌ Over-mocking
   - Mocking everything tests nothing
   - Prefer real implementations or test doubles

❌ Test interdependence
   - Tests must run in any order
   - Each test must be isolated

❌ Hardcoded time
   - Use clock mocking
   - Inject time source

❌ Sleep/delay in tests
   - Use polling or async/await
   - Condition-based waiting

❌ Ignoring flaky tests
   - Fix or delete, never skip
   - Flaky tests erode trust

❌ Too much test setup
   - Indicates code design issues
   - Refactor code, not just tests

❌ Testing private methods
   - Test through public interface
   - If hard to test, refactor

Testing Checklist

## Before Writing Tests
- [ ] Understood requirements
- [ ] Identified test cases
- [ ] Determined test type (unit/integration/E2E)
- [ ] Planned test data

## Writing Tests
- [ ] Follows AAA or Given-When-Then
- [ ] Descriptive test names
- [ ] One assertion per concept
- [ ] Tests behavior, not implementation
- [ ] Covers happy path
- [ ] Covers error cases
- [ ] Covers boundary conditions

## Test Quality
- [ ] Tests are deterministic
- [ ] Tests are isolated
- [ ] Tests are fast
- [ ] No hardcoded values that will break
- [ ] Proper mocking (not over-mocking)

## Coverage
- [ ] Line coverage ≥ 80%
- [ ] Branch coverage ≥ 70%
- [ ] Critical paths at 100%

## CI/CD
- [ ] Tests run on every push
- [ ] Coverage reported
- [ ] Thresholds enforced

Key Principles

Tests are documentation — They describe expected behavior
Fast feedback — Tests should run in seconds
Deterministic — Same input, same result, every time
Independent — No test depends on another
Focused — One test, one concept
Maintainable — Test code is production code

Install Skill

SKILL.md