name	scientific-tdd
description	Pragmatic test-driven development for scientific code with numerical validation
tags	testing, tdd, scientific, numerical
version	1

Scientific Test-Driven Development

Overview

Pragmatic test-driven development for scientific code: write tests first for new features and complex changes, verify with tests for simple bug fixes.

Core principle: Tests before implementation for new behavior, tests verify implementation for known bugs.

Announce at start: "I'm using the scientific-tdd skill to implement this feature."

When to Use This Skill

MUST use for:

New features or algorithms
Complex modifications to existing code
Adding new mathematical models
Implementing new likelihood functions or state transitions

Can skip test-first for:

Simple bug fixes where existing tests already cover the behavior
Documentation changes
Refactoring with existing comprehensive tests (use safe-refactoring instead)

Process Checklist

Copy to TodoWrite:

Scientific TDD Progress:
- [ ] Understand existing behavior (read code and tests)
- [ ] Write test capturing desired new behavior
- [ ] Run test to confirm RED (fails as expected)
- [ ] Implement minimal code to pass test
- [ ] Run test to confirm GREEN (passes)
- [ ] Run full test suite (check for regressions)
- [ ] Run numerical validation if mathematical code changed
- [ ] Run code-reviewer agent (and/or ux-reviewer when appropriate)
- [ ] Refactor if needed (keep tests green)
- [ ] Commit with descriptive message

Detailed Steps

Step 1: Understand Existing Behavior

Before writing new tests, understand current state:

Read relevant source files
Read existing tests for similar functionality
Run existing tests to see current behavior
Identify what needs to change

Commands:

# Find relevant tests
pytest --collect-only -q | grep <relevant_term>

# Run specific test file
/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest src/non_local_detector/tests/<test_file>.py -v

Step 2: Write Failing Test (RED)

Write test that captures desired behavior:

Test Structure:

def test_descriptive_name_of_behavior():
    """Test that [specific behavior] works correctly.

    This test verifies that [explain what you're testing] when [condition].
    """
    # Arrange: Set up test data
    input_data = create_test_input()

    # Act: Call the function/method
    result = function_under_test(input_data)

    # Assert: Verify behavior
    assert result.shape == expected_shape
    assert np.allclose(result.sum(), 1.0, atol=1e-10)  # Probabilities sum to 1

For mathematical code, verify:

Correct output shapes
Mathematical invariants (probabilities sum to 1, matrices are stochastic)
Expected numerical values (with appropriate tolerances)
Edge cases (empty inputs, single element, boundary conditions)

Step 3: Run Test - Confirm RED

CRITICAL: Test MUST fail before implementing:

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest src/non_local_detector/tests/<test_file>.py::test_name -v

Expected output: Test fails with clear error (function not defined, wrong output, etc.)

If test passes: The test isn't testing new behavior - reconsider what you're testing.

Step 4: Implement Minimal Code

Write simplest code that makes test pass:

Don't over-engineer
Don't add features not tested
Follow YAGNI (You Aren't Gonna Need It)
Use existing patterns from codebase

For scientific code:

Maintain numerical stability
Use JAX operations where appropriate
Follow existing conventions for shapes and broadcasting

Step 5: Run Test - Confirm GREEN

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest src/non_local_detector/tests/<test_file>.py::test_name -v

Expected output: Test passes

If test fails: Debug until it passes, then verify you're testing the right thing.

Step 6: Run Full Test Suite

Check for regressions:

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v

Expected: All tests pass (same count as before)

If new failures: Your change broke something - fix before proceeding.

Step 7: Numerical Validation (if applicable)

If you modified mathematical/algorithmic code:

Use numerical-validation skill:

@numerical-validation

This verifies:

Mathematical invariants still hold
Property-based tests pass
Golden regression tests pass
No unexpected numerical differences

Step 8: Refactor (optional)

If code can be improved while keeping tests green:

Improve readability
Extract reusable functions
Optimize performance (but verify numerics don't change)

After each refactor:

/Users/edeno/miniconda3/envs/non_local_detector/bin/pytest -v

Step 9: Commit

git add <test_file> <implementation_file>
git commit -m "feat: add <feature description>

- Add test for <specific behavior>
- Implement <what you implemented>
- All tests passing (<N> tests)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>"

Example Workflow

Task: Add new random walk transition with custom variance

1. Read: src/non_local_detector/continuous_state_transitions.py
2. Read: src/non_local_detector/tests/transitions/test_continuous_transitions.py
3. Write test: test_random_walk_custom_variance()
4. Run test: FAIL - "NotImplementedError: custom variance not supported"
5. Implement: Add variance parameter to RandomWalk class
6. Run test: PASS
7. Run full suite: 427 tests passed
8. Run numerical validation: All invariants hold
9. Commit: "feat: add custom variance support to RandomWalk"

Integration with Other Skills

Before using this skill: Often preceded by brainstorming or design discussion
Use with numerical-validation: For mathematical code changes
After this skill: May use safe-refactoring for cleanup
Alternative to this skill: Use safe-refactoring if changing structure, not behavior

Red Flags

Don't:

Write implementation before test (except for documented bug fixes)
Skip running test to see it fail
Add untested code "for future use"
Skip full test suite after implementation
Commit failing tests
Skip numerical validation for mathematical code

Do:

Write descriptive test names
Test one behavior per test
Use appropriate numerical tolerances (1e-10 for probabilities)
Run tests frequently
Commit small, working increments
Ask if unsure whether to use TDD for a specific change

scientific-tdd

Install Skill

SKILL.md