| name | Pytest Testing Patterns |
| description | Apply pytest best practices and project testing conventions. Use when writing or modifying tests. |
Pytest Testing Patterns for Constitutional AI
When to Apply
Automatically activate when:
- Creating new test files
- Modifying existing tests
- Debugging test failures
- Writing test fixtures or mocks
Project Testing Conventions
Test File Structure
Naming convention:
tests/test_{module}.py # Matches source: constitutional_ai/{module}.py
Example mapping:
constitutional_ai/framework.py → tests/test_framework.py
constitutional_ai/trainer.py → tests/test_trainer.py
constitutional_ai/principles.py → tests/test_principles.py
Test Function Naming
def test_{functionality}():
"""Test {what it does}."""
pass
def test_{class}_{method}():
"""Test {Class}.{method} does {what}."""
pass
def test_{edge_case}_raises_{exception}():
"""Test that {edge case} raises {ExceptionType}."""
pass
Examples:
def test_framework_initialization():
"""Test ConstitutionalFramework initializes with default principles."""
def test_evaluate_text_returns_dict():
"""Test evaluate_text returns dictionary with expected keys."""
def test_invalid_principle_raises_value_error():
"""Test that invalid principle configuration raises ValueError."""
Pytest Core Patterns
Basic Test Structure
import pytest
from constitutional_ai import ConstitutionalFramework
def test_example():
# Arrange
framework = ConstitutionalFramework()
# Act
result = framework.evaluate_text("test input")
# Assert
assert result is not None
assert 'any_flagged' in result
Fixtures for Setup
@pytest.fixture
def framework():
"""Provide a configured ConstitutionalFramework for tests."""
from constitutional_ai import setup_default_framework
return setup_default_framework()
@pytest.fixture
def mock_model():
"""Provide a mock model for testing."""
from tests.mocks.mock_model import MockModel
return MockModel()
def test_with_fixtures(framework, mock_model):
"""Test using fixtures."""
result = framework.evaluate_text("test", model=mock_model)
assert result['any_flagged'] == False
Parametrized Tests
@pytest.mark.parametrize("input_text,expected_flagged", [
("helpful response", False),
("harmful content", True),
("neutral statement", False),
])
def test_evaluation_cases(framework, input_text, expected_flagged):
"""Test evaluation with various inputs."""
result = framework.evaluate_text(input_text)
assert result['any_flagged'] == expected_flagged
Testing Exceptions
def test_invalid_input_raises():
"""Test that invalid input raises ValueError."""
framework = ConstitutionalFramework()
with pytest.raises(ValueError, match="must not be empty"):
framework.evaluate_text("")
def test_missing_model_raises():
"""Test that missing model raises appropriate error."""
with pytest.raises(RuntimeError):
generate_text(prompt="test", model=None, tokenizer=None)
Mocking AI Components
Use Project Mocks
The project has mocks in tests/mocks/ directory:
from tests.mocks.mock_model import MockModel
from tests.mocks.mock_tokenizer import MockTokenizer
def test_with_mock_model():
"""Test using mock model for deterministic results."""
model = MockModel()
tokenizer = MockTokenizer()
# Mock returns predictable outputs
output = model.generate(inputs)
assert output is not None
Why Mock AI Components?
- Determinism: Real models are stochastic
- Speed: Avoid expensive model loading/inference
- Isolation: Test logic independently of model behavior
- Portability: Tests run without GPU or large models
When to Mock vs Real Models
# ✅ Mock for unit tests
def test_critique_revision_logic():
"""Test critique-revision pipeline logic."""
model = MockModel() # Fast, deterministic
# Test the pipeline logic, not model quality
# ✅ Real model for integration tests (marked slow)
@pytest.mark.slow
def test_end_to_end_training():
"""Test full training pipeline with real model."""
model = load_model("gpt2") # Real model
# Test complete system integration
Coverage Best Practices
Project Coverage Target
- Current: ~45% coverage
- Target: This is acceptable for ML research code
- Why: Training loops, model internals are hard to test exhaustively
Focus Testing On
Framework logic (high value, testable)
- Principle management
- Evaluation logic
- Configuration handling
Data processing (deterministic, testable)
- Dataset creation
- Tokenization
- Preference pair generation
Edge cases (prevent regressions)
- Empty inputs
- Invalid configurations
- Missing dependencies
Skip Detailed Testing For
- Model internals (PyTorch's responsibility)
- Stochastic generation (unless mocked)
- Long training runs (expensive, flaky)
Test Organization Patterns
Grouping with Classes
class TestConstitutionalFramework:
"""Tests for ConstitutionalFramework class."""
def test_initialization(self):
"""Test framework initializes correctly."""
framework = ConstitutionalFramework()
assert framework is not None
def test_add_principle(self):
"""Test adding a principle."""
framework = ConstitutionalFramework()
principle = ConstitutionalPrinciple(name="test", weight=1.0)
framework.add_principle(principle)
assert "test" in framework.principles
Setup and Teardown
class TestWithSetup:
def setup_method(self):
"""Run before each test method."""
self.framework = setup_default_framework()
def teardown_method(self):
"""Run after each test method."""
# Cleanup if needed
pass
def test_example(self):
"""Test using setup."""
result = self.framework.evaluate_text("test")
assert result is not None
Running Tests (Common Commands)
All tests
pytest tests/ -v
With coverage
pytest tests/ --cov=constitutional_ai --cov-report=term-missing
Specific module
pytest tests/test_framework.py -v
Stop on first failure
pytest tests/ -x
Only failed tests from last run
pytest tests/ --lf
Parallel execution
pytest tests/ -n auto
Slow tests only
pytest tests/ -m slow
Skip slow tests
pytest tests/ -m "not slow"
Test Quality Checklist
Before committing tests:
- Test file named
test_{module}.py - Test functions named
test_{functionality} - Docstrings explain what is being tested
- AI components mocked for determinism
- Edge cases covered (empty inputs, invalid configs)
- Assertions are specific and meaningful
- Tests are independent (no shared state)
- Fast tests are unmarked, slow tests marked
@pytest.mark.slow
Current Test Status
Metrics:
- Total tests: 658
- Passing: ~100% (after recent fixes)
- Coverage: ~45% (acceptable for ML code)
- Test files: 14
Key test modules:
- test_framework.py - Framework and principle management
- test_principles.py - Evaluation functions
- test_critique_revision.py - Phase 1 pipeline
- test_reward_model.py - Reward model training
- test_ppo_trainer.py - PPO optimization
- test_trainer.py - Supervised fine-tuning
- test_evaluator.py - Evaluation logic
- test_model_utils.py - Model loading and generation
Example: Complete Test File Template
"""Tests for constitutional_ai/{module}.py"""
import pytest
from constitutional_ai import SomeClass
from tests.mocks.mock_model import MockModel
@pytest.fixture
def instance():
"""Provide configured instance for testing."""
return SomeClass()
class TestSomeClass:
"""Tests for SomeClass."""
def test_initialization(self, instance):
"""Test SomeClass initializes correctly."""
assert instance is not None
def test_method(self, instance):
"""Test method does expected thing."""
result = instance.method("input")
assert result == "expected"
@pytest.mark.parametrize("input,expected", [
("a", 1),
("b", 2),
])
def test_parametrized(self, instance, input, expected):
"""Test method with various inputs."""
assert instance.method(input) == expected
def test_edge_case_raises(self, instance):
"""Test edge case raises appropriate error."""
with pytest.raises(ValueError):
instance.method(None)
@pytest.mark.slow
def test_integration():
"""Test end-to-end with real components."""
# Use real models, expensive operations
pass