| name | scientific-python-testing |
| description | Write robust, maintainable tests for scientific Python packages using pytest best practices following Scientific Python community guidelines |
Scientific Python Testing with pytest
A comprehensive guide to writing effective tests for scientific Python packages using pytest, following the Scientific Python Community guidelines and testing tutorial. This skill focuses on modern testing patterns, fixtures, parametrization, and best practices specific to scientific computing.
Quick Reference Card
Common Testing Tasks - Quick Decisions:
# 1. Basic test → Use simple assert
def test_function():
assert result == expected
# 2. Floating-point comparison → Use approx
from pytest import approx
assert result == approx(0.333, rel=1e-6)
# 3. Testing exceptions → Use pytest.raises
with pytest.raises(ValueError, match="must be positive"):
function(-1)
# 4. Multiple inputs → Use parametrize
@pytest.mark.parametrize("input,expected", [(1,1), (2,4), (3,9)])
def test_square(input, expected):
assert input**2 == expected
# 5. Reusable setup → Use fixture
@pytest.fixture
def sample_data():
return np.array([1, 2, 3, 4, 5])
# 6. NumPy arrays → Use approx or numpy.testing
assert np.mean(data) == approx(3.0)
Decision Tree:
- Need multiple test cases with same logic? → Parametrize
- Need reusable test data/setup? → Fixture
- Testing floating-point results? → pytest.approx
- Testing exceptions/warnings? → pytest.raises / pytest.warns
- Complex numerical arrays? → numpy.testing.assert_allclose
- Organizing by speed? → Markers and separate directories
When to Use This Skill
- Writing tests for scientific Python packages and libraries
- Testing numerical algorithms and scientific computations
- Setting up test infrastructure for research software
- Implementing continuous integration for scientific code
- Testing data analysis pipelines and workflows
- Validating scientific simulations and models
- Ensuring reproducibility and correctness of research code
- Testing code that uses NumPy, SciPy, Pandas, and other scientific libraries
Core Concepts
1. Why pytest for Scientific Python
pytest is the de facto standard for testing Python packages because it:
- Simple syntax: Just use Python's
assertstatement - Detailed reporting: Clear, informative failure messages
- Powerful features: Fixtures, parametrization, marks, plugins
- Scientific ecosystem: Native support for NumPy arrays, approximate comparisons
- Community standard: Used by NumPy, SciPy, Pandas, scikit-learn, and more
2. Test Structure and Organization
Standard test directory layout:
my-package/
├── src/
│ └── my_package/
│ ├── __init__.py
│ ├── analysis.py
│ └── utils.py
├── tests/
│ ├── conftest.py
│ ├── test_analysis.py
│ └── test_utils.py
└── pyproject.toml
Key principles:
- Tests directory separate from source code (alongside
src/) - Test files named
test_*.py(pytest discovery) - Test functions named
test_*(pytest discovery) - No
__init__.pyin tests directory (avoid importability issues) - Test against installed package, not local source
3. pytest Configuration
Configure pytest in pyproject.toml (recommended for modern packages):
[tool.pytest.ini_options]
minversion = "7.0"
addopts = [
"-ra", # Show summary of all test outcomes
"--showlocals", # Show local variables in tracebacks
"--strict-markers", # Error on undefined markers
"--strict-config", # Error on config issues
]
xfail_strict = true # xfail tests must fail
filterwarnings = [
"error", # Treat warnings as errors
]
log_cli_level = "info" # Log level for test output
testpaths = [
"tests", # Limit pytest to tests directory
]
Testing Principles
Following the Scientific Python testing recommendations, effective testing provides multiple benefits and should follow key principles:
Advantages of Testing
- Trustworthy code: Well-tested code behaves as expected and can be relied upon
- Living documentation: Tests communicate intent and expected behavior, validated with each run
- Preventing failure: Tests protect against implementation errors and unexpected dependency changes
- Confidence when making changes: Thorough test suites enable adding features, fixing bugs, and refactoring with confidence
Fundamental Principles
1. Any test case is better than none
When in doubt, write the test that makes sense at the time:
- Test critical behaviors, features, and logic
- Write clear, expressive, well-documented tests
- Tests are documentation of developer intentions
- Good tests make it clear what they are testing and how
Don't get bogged down in taxonomy when learning—focus on writing tests that work.
2. As long as that test is correct
It's surprisingly easy to write tests that pass when they should fail:
- Check that your test fails when it should: Deliberately break the code and verify the test fails
- Keep it simple: Excessive mocks and fixtures make it difficult to know what's being tested
- Test one thing at a time: A single test should test a single behavior
3. Start with Public Interface Tests
Begin by testing from the perspective of a user:
- Test code as users will interact with it
- Keep tests simple and readable for documentation purposes
- Focus on supported use cases
- Avoid testing private attributes
- Minimize use of mocks/patches
4. Organize Tests into Suites
Divide tests by type and execution time for efficiency:
- Unit tests: Fast, isolated tests of individual components
- Integration tests: Tests of component interactions and dependencies
- End-to-end tests: Complete workflow testing
Benefits:
- Run relevant tests quickly and frequently
- "Fail fast" by running fast suites first
- Easier to read and reason about
- Avoid false positives from expected external failures
Outside-In Testing Approach
The recommended approach is outside-in, starting from the user's perspective:
- Public Interface Tests: Test from user perspective, focusing on behavior and features
- Integration Tests: Test that components work together and with dependencies
- Unit Tests: Test individual units in isolation, optimized for speed
This approach ensures you're building the right thing before optimizing implementation details.
Quick Start
Minimal Test Example
# tests/test_basic.py
def test_simple_math():
"""Test basic arithmetic."""
assert 4 == 2**2
def test_string_operations():
"""Test string methods."""
result = "hello world".upper()
assert result == "HELLO WORLD"
assert "HELLO" in result
Scientific Test Example
# tests/test_scientific.py
import numpy as np
from pytest import approx
from my_package.analysis import compute_mean, fit_linear
def test_compute_mean():
"""Test mean calculation."""
data = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
result = compute_mean(data)
assert result == approx(3.0)
def test_fit_linear():
"""Test linear regression."""
x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 2, 4, 6, 8])
slope, intercept = fit_linear(x, y)
assert slope == approx(2.0)
assert intercept == approx(0.0)
Testing Best Practices
Pattern 1: Writing Simple, Focused Tests
Bad - Multiple assertions testing different things:
def test_everything():
data = load_data("input.csv")
assert len(data) > 0
processed = process_data(data)
assert processed.mean() > 0
result = analyze(processed)
assert result.success
Good - Separate tests for each behavior:
def test_load_data_returns_nonempty():
"""Data loading should return at least one row."""
data = load_data("input.csv")
assert len(data) > 0
def test_process_data_positive_mean():
"""Processed data should have positive mean."""
data = load_data("input.csv")
processed = process_data(data)
assert processed.mean() > 0
def test_analyze_succeeds():
"""Analysis should complete successfully."""
data = load_data("input.csv")
processed = process_data(data)
result = analyze(processed)
assert result.success
Arrange-Act-Assert pattern:
def test_computation():
# Arrange - Set up test data
data = np.array([1, 2, 3, 4, 5])
expected = 3.0
# Act - Execute the function
result = compute_mean(data)
# Assert - Check the result
assert result == approx(expected)
Pattern 2: Testing for Failures
Always test that your code raises appropriate exceptions:
import pytest
def test_zero_division_raises():
"""Division by zero should raise ZeroDivisionError."""
with pytest.raises(ZeroDivisionError):
result = 1 / 0
def test_invalid_input_raises():
"""Invalid input should raise ValueError."""
with pytest.raises(ValueError, match="must be positive"):
result = compute_sqrt(-1)
def test_deprecation_warning():
"""Deprecated function should warn."""
with pytest.warns(DeprecationWarning):
result = old_function()
def test_deprecated_call():
"""Check for deprecated API usage."""
with pytest.deprecated_call():
result = legacy_api()
Pattern 3: Approximate Comparisons
Scientific computing often involves floating-point arithmetic that cannot be tested for exact equality:
For scalars:
from pytest import approx
def test_approximate_scalar():
"""Test with approximate comparison."""
result = 1 / 3
assert result == approx(0.33333333333, rel=1e-10)
# Default relative tolerance is 1e-6
assert 0.3 + 0.3 == approx(0.6)
def test_approximate_with_absolute_tolerance():
"""Test with absolute tolerance."""
result = compute_small_value()
assert result == approx(0.0, abs=1e-10)
For NumPy arrays (preferred over numpy.testing):
import numpy as np
from pytest import approx
def test_array_approximate():
"""Test NumPy arrays with approx."""
result = np.array([0.1, 0.2, 0.3])
expected = np.array([0.10001, 0.20001, 0.30001])
assert result == approx(expected)
def test_array_with_nan():
"""Handle NaN values in arrays."""
result = np.array([1.0, np.nan, 3.0])
expected = np.array([1.0, np.nan, 3.0])
assert result == approx(expected, nan_ok=True)
When to use numpy.testing:
import numpy as np
from numpy.testing import assert_allclose, assert_array_equal
def test_exact_integer_array():
"""Use numpy.testing for exact integer comparisons."""
result = np.array([1, 2, 3])
expected = np.array([1, 2, 3])
assert_array_equal(result, expected)
def test_complex_array_tolerances():
"""Use numpy.testing for complex tolerance requirements."""
result = compute_result()
expected = load_reference()
assert_allclose(result, expected, rtol=1e-7, atol=1e-10)
Pattern 4: Using Fixtures
Fixtures provide reusable test setup and teardown:
Basic fixtures:
import pytest
import numpy as np
@pytest.fixture
def sample_data():
"""Provide sample data for tests."""
return np.array([1.0, 2.0, 3.0, 4.0, 5.0])
@pytest.fixture
def empty_array():
"""Provide empty array for edge case tests."""
return np.array([])
def test_mean_with_fixture(sample_data):
"""Test using fixture."""
result = np.mean(sample_data)
assert result == approx(3.0)
def test_empty_array(empty_array):
"""Test edge case with empty array."""
with pytest.warns(RuntimeWarning):
result = np.mean(empty_array)
assert np.isnan(result)
Fixtures with setup and teardown:
import pytest
import tempfile
from pathlib import Path
@pytest.fixture
def temp_datafile():
"""Create temporary data file for tests."""
# Setup
tmpfile = tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt')
tmpfile.write("1.0\n2.0\n3.0\n")
tmpfile.close()
# Provide to test
yield Path(tmpfile.name)
# Teardown
Path(tmpfile.name).unlink()
def test_load_data(temp_datafile):
"""Test data loading from file."""
data = np.loadtxt(temp_datafile)
assert len(data) == 3
assert data[0] == approx(1.0)
Fixture scopes:
@pytest.fixture(scope="function") # Default, run for each test
def data_per_test():
return create_data()
@pytest.fixture(scope="class") # Run once per test class
def data_per_class():
return create_data()
@pytest.fixture(scope="module") # Run once per module
def data_per_module():
return load_large_dataset()
@pytest.fixture(scope="session") # Run once per test session
def database_connection():
conn = create_connection()
yield conn
conn.close()
Auto-use fixtures:
@pytest.fixture(autouse=True)
def reset_random_seed():
"""Reset random seed before each test for reproducibility."""
np.random.seed(42)
Pattern 5: Parametrized Tests
Test the same function with multiple inputs:
Basic parametrization:
import pytest
@pytest.mark.parametrize("input_val,expected", [
(0, 0),
(1, 1),
(2, 4),
(3, 9),
(-2, 4),
])
def test_square(input_val, expected):
"""Test squaring with multiple inputs."""
assert input_val**2 == expected
@pytest.mark.parametrize("angle", [0, np.pi/6, np.pi/4, np.pi/3, np.pi/2])
def test_sine_range(angle):
"""Test sine function returns values in [0, 1] for first quadrant."""
result = np.sin(angle)
assert 0 <= result <= 1
Multiple parameters:
@pytest.mark.parametrize("n_air,n_water", [
(1.0, 1.33),
(1.0, 1.5),
(1.5, 1.0),
])
def test_refraction(n_air, n_water):
"""Test Snell's law with different refractive indices."""
angle_in = np.pi / 4
angle_out = snell(angle_in, n_air, n_water)
assert angle_out >= 0
assert angle_out <= np.pi / 2
Parametrized fixtures:
@pytest.fixture(params=[1, 2, 3], ids=["one", "two", "three"])
def dimension(request):
"""Parametrized fixture for different dimensions."""
return request.param
def test_array_creation(dimension):
"""Test array creation in different dimensions."""
shape = tuple([10] * dimension)
arr = np.zeros(shape)
assert arr.ndim == dimension
assert arr.shape == shape
Combining parametrization with custom IDs:
@pytest.mark.parametrize(
"data,expected",
[
(np.array([1, 2, 3]), 2.0),
(np.array([1, 1, 1]), 1.0),
(np.array([0, 10]), 5.0),
],
ids=["sequential", "constant", "extremes"]
)
def test_mean_with_ids(data, expected):
"""Test mean with descriptive test IDs."""
assert np.mean(data) == approx(expected)
Pattern 6: Test Organization with Markers
Use markers to organize and selectively run tests:
Basic markers:
import pytest
@pytest.mark.slow
def test_expensive_computation():
"""Test that takes a long time."""
result = run_simulation(n_iterations=1000000)
assert result.converged
@pytest.mark.requires_gpu
def test_gpu_acceleration():
"""Test that requires GPU hardware."""
result = compute_on_gpu(large_array)
assert result.success
@pytest.mark.integration
def test_full_pipeline():
"""Integration test for complete workflow."""
data = load_data()
processed = preprocess(data)
result = analyze(processed)
output = save_results(result)
assert output.exists()
Running specific markers:
pytest -m slow # Run only slow tests
pytest -m "not slow" # Skip slow tests
pytest -m "slow or gpu" # Run slow OR gpu tests
pytest -m "slow and integration" # Run slow AND integration tests
Skip and xfail markers:
@pytest.mark.skip(reason="Feature not implemented yet")
def test_future_feature():
"""Test for feature under development."""
result = future_function()
assert result.success
@pytest.mark.skipif(sys.version_info < (3, 10), reason="Requires Python 3.10+")
def test_pattern_matching():
"""Test using Python 3.10+ features."""
match value:
case 0:
result = "zero"
case _:
result = "other"
assert result == "zero"
@pytest.mark.xfail(reason="Known bug in upstream library")
def test_known_failure():
"""Test that currently fails due to known issue."""
result = buggy_function()
assert result == expected
@pytest.mark.xfail(strict=True)
def test_must_fail():
"""Test that MUST fail (test will fail if it passes)."""
with pytest.raises(NotImplementedError):
unimplemented_function()
Custom markers in pyproject.toml:
[tool.pytest.ini_options]
markers = [
"slow: marks tests as slow (deselect with '-m \"not slow\"')",
"requires_gpu: marks tests that need GPU hardware",
"integration: marks tests as integration tests",
"unit: marks tests as unit tests",
]
Pattern 6b: Organizing Test Suites by Directory
Following Scientific Python recommendations, organize tests into separate directories by type and execution time:
tests/
├── unit/ # Fast, isolated unit tests
│ ├── conftest.py
│ ├── test_analysis.py
│ └── test_utils.py
├── integration/ # Integration tests
│ ├── conftest.py
│ └── test_pipeline.py
├── e2e/ # End-to-end tests
│ └── test_workflows.py
└── conftest.py # Shared fixtures
Run specific test suites:
# Run only unit tests (fast)
pytest tests/unit/
# Run integration tests after unit tests pass
pytest tests/integration/
# Run all tests
pytest
Auto-mark all tests in a directory using conftest.py:
# tests/unit/conftest.py
import pytest
def pytest_collection_modifyitems(session, config, items):
"""Automatically mark all tests in this directory as unit tests."""
for item in items:
item.add_marker(pytest.mark.unit)
Benefits of organized test suites:
- Run fast tests first ("fail fast" principle)
- Developers can run relevant tests quickly
- Clear separation of test types
- Avoid false positives from slow/flaky tests
- Better CI/CD optimization
Example test runner strategy:
# Run fast unit tests first, stop on failure
pytest tests/unit/ -x || exit 1
# If unit tests pass, run integration tests
pytest tests/integration/ -x || exit 1
# Finally run slow end-to-end tests
pytest tests/e2e/
Pattern 7: Mocking and Monkeypatching
Mock expensive operations or external dependencies:
Basic monkeypatching:
import platform
def test_platform_specific_behavior(monkeypatch):
"""Test behavior on different platforms."""
# Mock platform.system() to return "Linux"
monkeypatch.setattr(platform, "system", lambda: "Linux")
result = get_platform_specific_path()
assert result == "/usr/local/data"
# Change mock to return "Windows"
monkeypatch.setattr(platform, "system", lambda: "Windows")
result = get_platform_specific_path()
assert result == r"C:\Users\data"
Mocking with pytest-mock:
import pytest
from unittest.mock import Mock
def test_expensive_computation(mocker):
"""Mock expensive computation."""
# Mock the expensive function
mock_compute = mocker.patch("my_package.analysis.expensive_compute")
mock_compute.return_value = 42
result = run_analysis()
# Verify the mock was called
mock_compute.assert_called_once()
assert result == 42
def test_matplotlib_plotting(mocker):
"""Test plotting without creating actual plots."""
mock_plt = mocker.patch("matplotlib.pyplot")
create_plot(data)
# Verify plot was created
mock_plt.figure.assert_called_once()
mock_plt.plot.assert_called_once()
mock_plt.savefig.assert_called_once_with("output.png")
Fixture for repeated mocking:
@pytest.fixture
def mock_matplotlib(mocker):
"""Mock matplotlib for testing plots."""
fig = mocker.Mock(spec=plt.Figure)
ax = mocker.Mock(spec=plt.Axes)
line2d = mocker.Mock(name="plot", spec=plt.Line2D)
ax.plot.return_value = (line2d,)
mpl = mocker.patch("matplotlib.pyplot", autospec=True)
mocker.patch("matplotlib.pyplot.subplots", return_value=(fig, ax))
return {"fig": fig, "ax": ax, "mpl": mpl}
def test_my_plot(mock_matplotlib):
"""Test plotting function."""
ax = mock_matplotlib["ax"]
my_plotting_function(ax=ax)
ax.plot.assert_called_once()
ax.set_xlabel.assert_called_once()
Pattern 8: Testing Against Installed Version
Always test the installed package, not local source:
Why this matters:
my-package/
├── src/
│ └── my_package/
│ ├── __init__.py
│ ├── data/ # Data files
│ │ └── reference.csv
│ └── analysis.py
└── tests/
└── test_analysis.py
Use src/ layout + editable install:
# Install in editable mode
pip install -e .
# Run tests against installed version
pytest
Benefits:
- Tests ensure package installs correctly
- Catches missing files (like data files)
- Tests work in CI/CD environments
- Validates package structure and imports
In tests, import from package:
# Good - imports installed package
from my_package.analysis import compute_mean
# Bad - would import from local src/ if not using src/ layout
# from analysis import compute_mean
Pattern 8b: Import Best Practices in Tests
Following Scientific Python unit testing guidelines, proper import patterns make tests more maintainable:
Keep imports local to file under test:
# Good - Import from the file being tested
from my_package.analysis import MyClass, compute_mean
def test_compute_mean():
"""Test imports from module under test."""
data = MyClass()
result = compute_mean(data)
assert result > 0
Why this matters:
- When code is refactored and symbols move, tests don't break
- Tests only care about symbols used in the file under test
- Reduces coupling between tests and internal code organization
Import specific symbols, not entire modules:
# Good - Specific imports, easy to mock
from numpy import mean as np_mean, ndarray as NpArray
def my_function(data: NpArray) -> float:
return np_mean(data)
# Good - Easy to patch in tests
def test_my_function(mocker):
mock_mean = mocker.patch("my_package.analysis.np_mean")
# ...
# Less ideal - Harder to mock effectively
import numpy as np
def my_function(data: np.ndarray) -> float:
return np.mean(data)
# Less ideal - Complex patching required
def test_my_function(mocker):
# Must patch through the aliased namespace
mock_mean = mocker.patch("my_package.analysis.np.mean")
# ...
Consider meaningful aliases:
# Make imports meaningful to your domain
from numpy import sum as numeric_sum
from scipy.stats import ttest_ind as statistical_test
# Easy to understand and replace
result = numeric_sum(values)
p_value = statistical_test(group1, group2)
This approach makes it easier to:
- Replace implementations without changing tests
- Mock dependencies effectively
- Understand code purpose from import names
Running pytest
Basic Usage
# Run all tests
pytest
# Run specific file
pytest tests/test_analysis.py
# Run specific test
pytest tests/test_analysis.py::test_mean
# Run tests matching pattern
pytest -k "mean or median"
# Verbose output
pytest -v
# Show local variables in failures
pytest -l # or --showlocals
# Stop at first failure
pytest -x
# Show stdout/stderr
pytest -s
Debugging Tests
# Drop into debugger on failure
pytest --pdb
# Drop into debugger at start of each test
pytest --trace
# Run last failed tests
pytest --lf
# Run failed tests first, then rest
pytest --ff
# Show which tests would be run (dry run)
pytest --collect-only
Coverage
# Install pytest-cov
pip install pytest-cov
# Run with coverage
pytest --cov=my_package
# With coverage report
pytest --cov=my_package --cov-report=html
# With missing lines
pytest --cov=my_package --cov-report=term-missing
# Fail if coverage below threshold
pytest --cov=my_package --cov-fail-under=90
Configure in pyproject.toml:
[tool.pytest.ini_options]
addopts = [
"--cov=my_package",
"--cov-report=term-missing",
"--cov-report=html",
]
[tool.coverage.run]
source = ["src"]
omit = [
"*/tests/*",
"*/__init__.py",
]
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"def __repr__",
"raise AssertionError",
"raise NotImplementedError",
"if __name__ == .__main__.:",
"if TYPE_CHECKING:",
"@abstractmethod",
]
Scientific Python Testing Patterns
Pattern 9: Testing Numerical Algorithms
import numpy as np
from pytest import approx
def test_numerical_stability():
"""Test algorithm is numerically stable."""
data = np.array([1e10, 1.0, -1e10])
result = stable_sum(data)
assert result == approx(1.0)
def test_convergence():
"""Test iterative algorithm converges."""
x0 = np.array([1.0, 1.0, 1.0])
result = iterative_solver(x0, tol=1e-8, max_iter=1000)
assert result.converged
assert result.iterations < 1000
assert result.residual < 1e-8
def test_against_analytical_solution():
"""Test against known analytical result."""
x = np.linspace(0, 1, 100)
numerical = compute_integral(lambda t: t**2, x)
analytical = x**3 / 3
assert numerical == approx(analytical, rel=1e-6)
def test_conservation_law():
"""Test that physical conservation law holds."""
initial_energy = compute_energy(system)
system.evolve(dt=0.01, steps=1000)
final_energy = compute_energy(system)
# Energy should be conserved (within numerical error)
assert final_energy == approx(initial_energy, rel=1e-10)
Pattern 10: Testing with Different NumPy dtypes
@pytest.mark.parametrize("dtype", [
np.float32,
np.float64,
np.complex64,
np.complex128,
])
def test_computation_dtypes(dtype):
"""Test function works with different dtypes."""
data = np.array([1, 2, 3, 4, 5], dtype=dtype)
result = compute_transform(data)
assert result.dtype == dtype
assert result.shape == data.shape
@pytest.mark.parametrize("dtype", [np.int32, np.int64, np.float32, np.float64])
def test_integer_and_float_types(dtype):
"""Test handling of integer and float types."""
arr = np.array([1, 2, 3], dtype=dtype)
result = safe_divide(arr, 2)
# Result should be floating point
assert result.dtype in [np.float32, np.float64]
Pattern 11: Testing Random/Stochastic Code
def test_random_with_seed():
"""Test random code with fixed seed for reproducibility."""
np.random.seed(42)
result1 = generate_random_samples(n=100)
np.random.seed(42)
result2 = generate_random_samples(n=100)
# Should get identical results with same seed
assert np.array_equal(result1, result2)
def test_statistical_properties():
"""Test statistical properties of random output."""
np.random.seed(123)
samples = generate_normal_samples(n=100000, mean=0, std=1)
# Test mean and std are close to expected (not exact due to randomness)
assert np.mean(samples) == approx(0, abs=0.01)
assert np.std(samples) == approx(1, abs=0.01)
@pytest.mark.parametrize("seed", [42, 123, 456])
def test_reproducibility_with_seeds(seed):
"""Test reproducibility with different seeds."""
np.random.seed(seed)
result = stochastic_algorithm()
# Should complete successfully regardless of seed
assert result.success
Pattern 12: Testing Data Pipelines
def test_pipeline_end_to_end(tmp_path):
"""Test complete data pipeline."""
# Arrange - Create input data
input_file = tmp_path / "input.csv"
input_file.write_text("x,y\n1,2\n3,4\n5,6\n")
output_file = tmp_path / "output.csv"
# Act - Run pipeline
result = run_pipeline(input_file, output_file)
# Assert - Check results
assert result.success
assert output_file.exists()
output_data = np.loadtxt(output_file, delimiter=",", skiprows=1)
assert len(output_data) == 3
def test_pipeline_stages_independently():
"""Test each pipeline stage separately."""
# Test stage 1
raw_data = load_data("input.csv")
assert len(raw_data) > 0
# Test stage 2
cleaned = clean_data(raw_data)
assert not np.any(np.isnan(cleaned))
# Test stage 3
transformed = transform_data(cleaned)
assert transformed.shape == cleaned.shape
# Test stage 4
result = analyze_data(transformed)
assert result.metrics["r2"] > 0.9
Pattern 13: Property-Based Testing with Hypothesis
For complex scientific code, consider property-based testing:
from hypothesis import given, strategies as st
from hypothesis.extra.numpy import arrays
import numpy as np
@given(arrays(np.float64, shape=st.integers(1, 100)))
def test_mean_is_bounded(arr):
"""Mean should be between min and max."""
if len(arr) > 0 and not np.any(np.isnan(arr)):
mean = np.mean(arr)
assert np.min(arr) <= mean <= np.max(arr)
@given(
x=arrays(np.float64, shape=10, elements=st.floats(-100, 100)),
y=arrays(np.float64, shape=10, elements=st.floats(-100, 100))
)
def test_linear_fit_properties(x, y):
"""Test properties of linear regression."""
if not (np.any(np.isnan(x)) or np.any(np.isnan(y))):
slope, intercept = fit_linear(x, y)
# Predictions should be finite
predictions = slope * x + intercept
assert np.all(np.isfinite(predictions))
Test Configuration Examples
Complete pyproject.toml Testing Section
[tool.pytest.ini_options]
minversion = "7.0"
addopts = [
"-ra", # Show summary of all test outcomes
"--showlocals", # Show local variables in tracebacks
"--strict-markers", # Error on undefined markers
"--strict-config", # Error on config issues
"--cov=my_package", # Coverage for package
"--cov-report=term-missing", # Show missing lines
"--cov-report=html", # HTML coverage report
]
xfail_strict = true # xfail tests must fail
filterwarnings = [
"error", # Treat warnings as errors
"ignore::DeprecationWarning:pkg_resources", # Ignore specific warning
"ignore::PendingDeprecationWarning",
]
log_cli_level = "info" # Log level for test output
testpaths = [
"tests", # Test directory
]
markers = [
"slow: marks tests as slow (deselect with '-m \"not slow\"')",
"integration: marks tests as integration tests",
"requires_gpu: marks tests that need GPU hardware",
]
[tool.coverage.run]
source = ["src"]
omit = [
"*/tests/*",
"*/__init__.py",
"*/conftest.py",
]
branch = true # Measure branch coverage
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"def __repr__",
"raise AssertionError",
"raise NotImplementedError",
"if __name__ == .__main__.:",
"if TYPE_CHECKING:",
"@abstractmethod",
]
precision = 2
show_missing = true
skip_covered = false
conftest.py for Shared Fixtures
# tests/conftest.py
import pytest
import numpy as np
from pathlib import Path
@pytest.fixture(scope="session")
def test_data_dir():
"""Provide path to test data directory."""
return Path(__file__).parent / "data"
@pytest.fixture
def sample_array():
"""Provide sample NumPy array."""
np.random.seed(42)
return np.random.randn(100)
@pytest.fixture
def temp_output_dir(tmp_path):
"""Provide temporary directory for test outputs."""
output_dir = tmp_path / "output"
output_dir.mkdir()
return output_dir
@pytest.fixture(autouse=True)
def reset_random_state():
"""Reset random state before each test."""
np.random.seed(42)
@pytest.fixture(scope="session")
def large_dataset():
"""Load large dataset once per test session."""
return load_reference_data()
# Platform-specific fixtures
@pytest.fixture(params=["Linux", "Darwin", "Windows"])
def platform_name(request, monkeypatch):
"""Parametrize tests across platforms."""
monkeypatch.setattr("platform.system", lambda: request.param)
return request.param
Common Testing Pitfalls and Solutions
Pitfall 1: Testing Implementation Instead of Behavior
Bad:
def test_uses_numpy_mean():
"""Test that function uses np.mean.""" # Testing implementation!
# This is fragile - breaks if implementation changes
pass
Good:
def test_computes_correct_average():
"""Test that function returns correct average."""
data = np.array([1, 2, 3, 4, 5])
result = compute_average(data)
assert result == approx(3.0)
Pitfall 2: Non-Deterministic Tests
Bad:
def test_random_sampling():
samples = generate_samples() # Uses random seed from system time!
assert samples[0] > 0 # Might fail randomly
Good:
def test_random_sampling():
np.random.seed(42) # Fixed seed
samples = generate_samples()
assert samples[0] == approx(0.4967, rel=1e-4)
Pitfall 3: Exact Floating-Point Comparisons
Bad:
def test_computation():
result = 0.1 + 0.2
assert result == 0.3 # Fails due to floating-point error!
Good:
def test_computation():
result = 0.1 + 0.2
assert result == approx(0.3)
Pitfall 4: Testing Too Much in One Test
Bad:
def test_entire_analysis():
# Load data
data = load_data()
assert data is not None
# Process
processed = process(data)
assert len(processed) > 0
# Analyze
result = analyze(processed)
assert result.score > 0.8
# Save
save_results(result, "output.txt")
assert Path("output.txt").exists()
Good:
def test_load_data_succeeds():
data = load_data()
assert data is not None
def test_process_returns_nonempty():
data = load_data()
processed = process(data)
assert len(processed) > 0
def test_analyze_gives_good_score():
data = load_data()
processed = process(data)
result = analyze(processed)
assert result.score > 0.8
def test_save_results_creates_file(tmp_path):
output_file = tmp_path / "output.txt"
result = create_mock_result()
save_results(result, output_file)
assert output_file.exists()
Testing Checklist
- Tests are in
tests/directory separate from source - Test files named
test_*.py - Test functions named
test_* - Tests run against installed package (use src/ layout)
- pytest configured in
pyproject.toml - Using
pytest.approxfor floating-point comparisons - Tests check exceptions with
pytest.raises - Tests check warnings with
pytest.warns - Parametrized tests for multiple inputs
- Fixtures for reusable setup
- Markers used for test organization
- Random tests use fixed seeds
- Tests are independent (can run in any order)
- Each test focuses on one behavior
- Coverage > 80% (preferably > 90%)
- All tests pass before committing
- Slow tests marked with
@pytest.mark.slow - Integration tests marked appropriately
- CI configured to run tests automatically
Continuous Integration
GitHub Actions Example
# .github/workflows/tests.yml
name: Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
- name: Run tests
run: |
pytest --cov=my_package --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v4
with:
files: ./coverage.xml
Resources
- Scientific Python pytest Guide: https://learn.scientific-python.org/development/guides/pytest/
- Scientific Python Testing Tutorial: https://learn.scientific-python.org/development/tutorials/test/
- Scientific Python Testing Principles: https://learn.scientific-python.org/development/principles/testing/
- pytest Documentation: https://docs.pytest.org/
- pytest-cov: https://pytest-cov.readthedocs.io/
- pytest-mock: https://pytest-mock.readthedocs.io/
- Hypothesis (property-based testing): https://hypothesis.readthedocs.io/
- NumPy testing utilities: https://numpy.org/doc/stable/reference/routines.testing.html
- Testing best practices: https://docs.python-guide.org/writing/tests/
Summary
Testing scientific Python code with pytest, following Scientific Python community principles, provides:
- Confidence: Know your code works correctly
- Reproducibility: Ensure consistent behavior across environments
- Documentation: Tests show how code should be used and communicate developer intent
- Refactoring safety: Change code without breaking functionality
- Regression prevention: Catch bugs before they reach users
- Scientific rigor: Validate numerical accuracy and physical correctness
Key testing principles:
- Start with public interface tests from the user's perspective
- Organize tests into suites (unit, integration, e2e) by type and speed
- Follow outside-in approach: public interface → integration → unit tests
- Keep tests simple, focused, and independent
- Test behavior rather than implementation
- Use pytest's powerful features (fixtures, parametrization, markers) effectively
- Always verify tests fail when they should to avoid false confidence
Remember: Any test is better than none, but well-organized tests following these principles create trustworthy, maintainable scientific software that the community can rely on.