| name | testing |
| description | Testing philosophy and practices. This skill should be used when writing tests, designing test strategies, or reviewing test code. Use proactively when discussing TDD, red-green-refactor, test doubles, mocks, stubs, fakes, RSpec, Jest, pytest, unit tests, integration tests, test coverage, or test-first development. (user) |
Testing
This skill provides guidance on testing philosophy and practices, emphasizing tests as specifications and API design through TDD.
Core Philosophy
Tests as Executable Specifications
Tests are not just verification tools — they are executable specifications that document how the system should behave. A well-written test suite serves as living documentation.
Tests as API Consumers
Tests are the first users of your code's APIs. This is why TDD is valuable: you design the API by thinking about the consumer first, before thinking about implementation.
When writing tests:
- Consider what interface would be most convenient for the caller
- Let the test drive the API design
- If the test is awkward to write, the API is awkward to use
Test-Driven Development (TDD)
Red-Green-Refactor
The TDD cycle consists of three phases:
- Red: Write a failing test for the next piece of functionality
- Green: Write the minimum code necessary to make the test pass
- Refactor: Improve the code while keeping tests green
Each cycle should be short — ideally minutes, not hours. Small steps reduce risk and provide frequent feedback.
The Value of TDD
- Forces thinking about the API before implementation
- Produces code with high test coverage by default
- Encourages simpler designs (testable code tends to be well-designed)
- Provides immediate feedback on whether code works
- Creates executable documentation of intended behavior
Flexible TDD
Strict TDD (one test at a time, red-green-refactor) is the ideal for learning and for complex logic. However, flexibility is acceptable:
Writing all tests first is appropriate when:
- Tests need human review/approval before implementation
- The behavior is well-understood and stable
- Documenting a specification before implementing
Writing tests after is acceptable when:
- Exploring or prototyping (but add tests before committing)
- The design is genuinely uncertain
- Spiking to learn about a problem
The goal is well-tested code with tests that serve as specifications. The path matters less than the destination, but TDD often produces better results.
Speed Matters
Tests should be fast. Slow tests discourage running them frequently, which defeats their purpose.
- Target sub-second feedback for unit tests
- Keep the full suite under a few minutes when possible
- Identify and isolate slow tests
Database Access
Avoid hitting the database in tests except when:
- Testing database-specific functionality (queries, constraints, transactions)
- Integration tests that specifically verify database behavior
Do not hit the database just to:
- Populate models or data structures
- Create test fixtures when in-memory objects would suffice
- Test business logic that happens to use database-backed models
Use factories or builders that create in-memory objects when database persistence isn't the thing being tested.
Test Structure
One Thing Per Test
Each test should verify one behavior. This doesn't always mean one assertion — sometimes verifying one behavior requires multiple assertions, especially when tests are slow. But the test should have a single reason to fail.
AAA Pattern
Structure tests using Arrange-Act-Assert:
- Arrange: Set up the preconditions
- Act: Execute the behavior being tested
- Assert: Verify the expected outcome
Keep each section clearly delineated. If any section is complex, consider extracting helper methods.
Given-When-Then
The BDD mindset aligns with AAA:
- Given (Arrange): The initial context
- When (Act): The event or action
- Then (Assert): The expected outcome
This framing helps focus on behavior from the user's perspective.
Mocking and Test Doubles
Prefer Real Objects
Avoid mocking when possible. Build small, simple components with immutable data to reduce the need for mocks.
When Mocking is Necessary
If mocking is unavoidable:
- Mock roles, not objects — mock interfaces/behaviors, not concrete implementations
- Prefer fakes over mocks — fakes (simplified implementations) are often clearer than mock expectations
- Keep mock setups simple; complex mocking often signals design problems
Signs of Excessive Mocking
- Tests that are mostly mock setup
- Mocks returning mocks
- Tests that break when implementation details change
- Difficulty understanding what's actually being tested
Consider these as signals to refactor the production code.
Custom Matchers
Use custom matchers (RSpec matchers, Jest matchers, etc.) to make assertions readable and intention-revealing.
Good:
expect(order).to be_fulfilled
expect(user).to have_permission(:admin)
Less clear:
expect(order.status).to eq("fulfilled")
expect(user.permissions).to include("admin")
Custom matchers:
- Make tests read like specifications
- Provide better failure messages
- Encapsulate complex assertions
- Can be reused across tests
Language-Specific Guidelines
Ruby (RSpec)
- Use RSpec as the primary testing framework
- Prefer
describefor classes/methods,contextfor states/conditions - Use
letfor lazy-evaluated test data - Use
subjectfor the thing being tested - Prefer
expectsyntax overshould - Use
beforesparingly; prefer explicit setup in each test when clarity matters - Create custom matchers for domain-specific assertions
- Use
shared_examplesfor common behavior across contexts - Use FactoryBot for test data, but prefer
buildovercreatewhen persistence isn't needed
RSpec.describe Order do
describe "#fulfill" do
context "when all items are in stock" do
it "marks the order as fulfilled" do
order = build(:order, :with_available_items)
order.fulfill
expect(order).to be_fulfilled
end
end
end
end
JavaScript (Jest/Vitest)
- Use descriptive test names that read as specifications
- Use
describeblocks to group related tests - Prefer explicit assertions over snapshot tests (unless testing UI output)
- Use
beforeEachfor common setup - Mock external dependencies, not internal modules
describe("Order", () => {
describe("fulfill", () => {
it("marks the order as fulfilled when all items are in stock", () => {
const order = buildOrder({ items: availableItems });
order.fulfill();
expect(order.isFulfilled()).toBe(true);
});
});
});
Bash (BATS or similar)
- Test scripts by testing their behavior, not their output format
- Use temporary directories for file-based tests
- Clean up test artifacts in teardown
- Test error conditions and exit codes
Test Smells
Watch for these warning signs:
- Slow tests: Usually means too much real I/O or database access
- Flaky tests: Often timing issues or shared state
- Fragile tests: Breaking when implementation changes, not behavior
- Mystery guests: Test data coming from somewhere non-obvious
- Eager tests: Testing too many things at once
- Obscure tests: Hard to understand what's being tested