name	Pytest Testing Patterns
description	Apply pytest best practices and project testing conventions. Use when writing or modifying tests.

Pytest Testing Patterns for Constitutional AI

When to Apply

Automatically activate when:

Creating new test files
Modifying existing tests
Debugging test failures
Writing test fixtures or mocks

Project Testing Conventions

Test File Structure

Naming convention:

tests/test_{module}.py  # Matches source: constitutional_ai/{module}.py

Example mapping:

constitutional_ai/framework.py  →  tests/test_framework.py
constitutional_ai/trainer.py    →  tests/test_trainer.py
constitutional_ai/principles.py →  tests/test_principles.py

Test Function Naming

def test_{functionality}():
    """Test {what it does}."""
    pass

def test_{class}_{method}():
    """Test {Class}.{method} does {what}."""
    pass

def test_{edge_case}_raises_{exception}():
    """Test that {edge case} raises {ExceptionType}."""
    pass

Examples:

def test_framework_initialization():
    """Test ConstitutionalFramework initializes with default principles."""

def test_evaluate_text_returns_dict():
    """Test evaluate_text returns dictionary with expected keys."""

def test_invalid_principle_raises_value_error():
    """Test that invalid principle configuration raises ValueError."""

Pytest Core Patterns

Basic Test Structure

import pytest
from constitutional_ai import ConstitutionalFramework


def test_example():
    # Arrange
    framework = ConstitutionalFramework()

    # Act
    result = framework.evaluate_text("test input")

    # Assert
    assert result is not None
    assert 'any_flagged' in result

Fixtures for Setup

@pytest.fixture
def framework():
    """Provide a configured ConstitutionalFramework for tests."""
    from constitutional_ai import setup_default_framework
    return setup_default_framework()


@pytest.fixture
def mock_model():
    """Provide a mock model for testing."""
    from tests.mocks.mock_model import MockModel
    return MockModel()


def test_with_fixtures(framework, mock_model):
    """Test using fixtures."""
    result = framework.evaluate_text("test", model=mock_model)
    assert result['any_flagged'] == False

Parametrized Tests

@pytest.mark.parametrize("input_text,expected_flagged", [
    ("helpful response", False),
    ("harmful content", True),
    ("neutral statement", False),
])
def test_evaluation_cases(framework, input_text, expected_flagged):
    """Test evaluation with various inputs."""
    result = framework.evaluate_text(input_text)
    assert result['any_flagged'] == expected_flagged

Testing Exceptions

def test_invalid_input_raises():
    """Test that invalid input raises ValueError."""
    framework = ConstitutionalFramework()

    with pytest.raises(ValueError, match="must not be empty"):
        framework.evaluate_text("")


def test_missing_model_raises():
    """Test that missing model raises appropriate error."""
    with pytest.raises(RuntimeError):
        generate_text(prompt="test", model=None, tokenizer=None)

Mocking AI Components

Use Project Mocks

The project has mocks in tests/mocks/ directory:

from tests.mocks.mock_model import MockModel
from tests.mocks.mock_tokenizer import MockTokenizer

def test_with_mock_model():
    """Test using mock model for deterministic results."""
    model = MockModel()
    tokenizer = MockTokenizer()

    # Mock returns predictable outputs
    output = model.generate(inputs)
    assert output is not None

Why Mock AI Components?

Determinism: Real models are stochastic
Speed: Avoid expensive model loading/inference
Isolation: Test logic independently of model behavior
Portability: Tests run without GPU or large models

When to Mock vs Real Models

# ✅ Mock for unit tests
def test_critique_revision_logic():
    """Test critique-revision pipeline logic."""
    model = MockModel()  # Fast, deterministic
    # Test the pipeline logic, not model quality

# ✅ Real model for integration tests (marked slow)
@pytest.mark.slow
def test_end_to_end_training():
    """Test full training pipeline with real model."""
    model = load_model("gpt2")  # Real model
    # Test complete system integration

Coverage Best Practices

Project Coverage Target

Current: ~45% coverage
Target: This is acceptable for ML research code
Why: Training loops, model internals are hard to test exhaustively

Focus Testing On

Framework logic (high value, testable)
- Principle management
- Evaluation logic
- Configuration handling
Data processing (deterministic, testable)
- Dataset creation
- Tokenization
- Preference pair generation
Edge cases (prevent regressions)
- Empty inputs
- Invalid configurations
- Missing dependencies

Skip Detailed Testing For

Model internals (PyTorch's responsibility)
Stochastic generation (unless mocked)
Long training runs (expensive, flaky)

Test Organization Patterns

Grouping with Classes

class TestConstitutionalFramework:
    """Tests for ConstitutionalFramework class."""

    def test_initialization(self):
        """Test framework initializes correctly."""
        framework = ConstitutionalFramework()
        assert framework is not None

    def test_add_principle(self):
        """Test adding a principle."""
        framework = ConstitutionalFramework()
        principle = ConstitutionalPrinciple(name="test", weight=1.0)
        framework.add_principle(principle)
        assert "test" in framework.principles

Setup and Teardown

class TestWithSetup:
    def setup_method(self):
        """Run before each test method."""
        self.framework = setup_default_framework()

    def teardown_method(self):
        """Run after each test method."""
        # Cleanup if needed
        pass

    def test_example(self):
        """Test using setup."""
        result = self.framework.evaluate_text("test")
        assert result is not None

Running Tests (Common Commands)

All tests

pytest tests/ -v

With coverage

pytest tests/ --cov=constitutional_ai --cov-report=term-missing

Specific module

pytest tests/test_framework.py -v

Stop on first failure

pytest tests/ -x

Only failed tests from last run

pytest tests/ --lf

Parallel execution

pytest tests/ -n auto

Slow tests only

pytest tests/ -m slow

Skip slow tests

pytest tests/ -m "not slow"

Test Quality Checklist

Before committing tests:

Test file named test_{module}.py
Test functions named test_{functionality}
Docstrings explain what is being tested
AI components mocked for determinism
Edge cases covered (empty inputs, invalid configs)
Assertions are specific and meaningful
Tests are independent (no shared state)
Fast tests are unmarked, slow tests marked @pytest.mark.slow

Current Test Status

Metrics:

Total tests: 658
Passing: ~100% (after recent fixes)
Coverage: ~45% (acceptable for ML code)
Test files: 14

Key test modules:

test_framework.py - Framework and principle management
test_principles.py - Evaluation functions
test_critique_revision.py - Phase 1 pipeline
test_reward_model.py - Reward model training
test_ppo_trainer.py - PPO optimization
test_trainer.py - Supervised fine-tuning
test_evaluator.py - Evaluation logic
test_model_utils.py - Model loading and generation

Example: Complete Test File Template

"""Tests for constitutional_ai/{module}.py"""
import pytest
from constitutional_ai import SomeClass
from tests.mocks.mock_model import MockModel


@pytest.fixture
def instance():
    """Provide configured instance for testing."""
    return SomeClass()


class TestSomeClass:
    """Tests for SomeClass."""

    def test_initialization(self, instance):
        """Test SomeClass initializes correctly."""
        assert instance is not None

    def test_method(self, instance):
        """Test method does expected thing."""
        result = instance.method("input")
        assert result == "expected"

    @pytest.mark.parametrize("input,expected", [
        ("a", 1),
        ("b", 2),
    ])
    def test_parametrized(self, instance, input, expected):
        """Test method with various inputs."""
        assert instance.method(input) == expected

    def test_edge_case_raises(self, instance):
        """Test edge case raises appropriate error."""
        with pytest.raises(ValueError):
            instance.method(None)


@pytest.mark.slow
def test_integration():
    """Test end-to-end with real components."""
    # Use real models, expensive operations
    pass

Pytest Testing Patterns

Install Skill

SKILL.md

Pytest Testing Patterns for Constitutional AI

When to Apply

Project Testing Conventions

Test File Structure

Test Function Naming

Pytest Core Patterns

Basic Test Structure

Fixtures for Setup

Parametrized Tests

Testing Exceptions

Mocking AI Components

Use Project Mocks

Why Mock AI Components?

When to Mock vs Real Models

Coverage Best Practices

Project Coverage Target

Focus Testing On

Skip Detailed Testing For

Test Organization Patterns

Grouping with Classes

Setup and Teardown

Running Tests (Common Commands)

All tests

With coverage

Specific module

Stop on first failure

Only failed tests from last run

Parallel execution

Slow tests only

Skip slow tests

Test Quality Checklist

Current Test Status

Example: Complete Test File Template