| name | testing |
| description | Framework for writing and reviewing tests. Use when creating new tests, reviewing test quality, or debugging flaky tests. |
Testing
Framework for writing and reviewing tests. Focus: actionable decisions, not perfection.
Core principle: Tests verify behavior, not implementation. If a test breaks on refactor without behavior change, it's wrong.
Decision Framework
Should I Test This?
| Priority | Scenario | Test Level |
|---|---|---|
| HIGH | Bug fix | Unit (reproduces) |
| HIGH | Business logic / complex rules | Unit |
| HIGH | Critical path (auth, payments) | Unit + Integration |
| HIGH | Public API / breaking = bad | Unit + Integration |
| MEDIUM | Integration points | Integration |
| MEDIUM | Error handling / edge cases | Unit |
| LOW | UI presentation | E2E selectively |
| LOW | Config / low complexity | Skip |
| SKIP | Spike / throwaway code | — |
Decision rule: HIGH risk OR HIGH complexity → test. LOW risk AND LOW complexity → skip.
What Type of Test?
Is it pure logic with no dependencies?
YES → Unit test
NO → Does it cross module/service boundaries?
YES → Integration test
NO → Is it a critical user journey?
YES → E2E test
NO → Unit test with mocks at edges
Writing Tests
Pre-Write (3 Questions)
- What behavior am I testing? (One sentence, no "and")
- What's the expected outcome? (Specific value/state)
- What inputs trigger this? (Including edge cases)
If you can't answer these clearly, you don't understand the requirement yet.
Write (AAA Structure)
Every test follows Arrange-Act-Assert. No exceptions.
// ARRANGE: Setup state and dependencies
validator = EmailValidator()
email = 'user@example.com'
// ACT: Execute ONE behavior
result = validator.validate(email)
// ASSERT: Verify specific outcome
assert(result.isValid == true)
assert(result.errors == [])
Naming Convention
Core pattern: {action} → {condition} → {expected_result}
This structure must be respected regardless of syntax. Adapt to:
- Existing project conventions — match the style already in use
- Language/framework idioms — if no existing tests, follow ecosystem norms
Valid examples (same semantic structure):
returns_null_when_user_not_found # snake_case
"returns null when user not found" # string description
returnsNullWhenUserNotFound # camelCase
ReturnsNull_WhenUserNotFound # PascalCase with separator
Anti-patterns (violate the structure):
❌ test_user # missing condition + result
❌ validation # not a behavior
❌ handles_edge_cases # which ones?
❌ should_work_correctly # vague result
Rule: If the name has "and", split the test.
Mocking Rules
| Mock | Don't Mock |
|---|---|
| External APIs | Business logic |
| Databases (for unit tests) | Pure functions |
| Network calls | Internal collaborators |
| Timers / randomness | Everything (over-mocking) |
Test real code paths. Mock only at system boundaries.
Edge Cases Checklist
For any function, consider:
- Empty input (
"",[],{},null,undefined) - Boundary values (0, -1, MAX_INT, empty string vs whitespace)
- Invalid types (if dynamically typed)
- Error conditions (network fail, timeout, permission denied)
Reviewing Tests
Red Flags → Quick Fixes
| Red Flag | Fix |
|---|---|
| Test name has "and" | Split into separate tests |
| Mocking internal classes | Remove mock, use real implementation |
assert(result) / truthy check |
Assert specific value: assert(result == expected) |
| Tests fail when run in different order | Remove shared state, each test owns its setup |
| Testing private methods | Test through public interface |
| No edge case tests | Add empty/null/boundary tests |
sleep() / timing assumptions |
Use callbacks, promises, or mock timers |
| Commented-out tests | Delete or fix them |
Anti-Pattern Detection
Over-mocking?
→ Are you testing mock behavior or real code?
→ Fix: Use real implementations, mock only edges
Implementation coupling?
→ Does test break on refactor without behavior change?
→ Fix: Test public interface, not internals
Flaky test?
→ Does it pass/fail inconsistently?
→ Fix: Check for timing, shared state, or non-determinism
Validation Framework
Criteria (Binary Pass/Fail)
| # | Criterion | ✅/❌ |
|---|---|---|
| 1 | Tests pass consistently | |
| 2 | One behavior per test | |
| 3 | Names describe behavior | |
| 4 | Tests are independent | |
| 5 | Critical paths covered | |
| 6 | Assertions are specific |
Verdict
| Result | Action |
|---|---|
| GOOD | All ✅ → Ready to merge |
| NEEDS_WORK | Any ❌ → Fix only failing items |
Anti-perfeccionism rule: Don't optimize what already passes. Fix failures, move on.
Output Formats
When Writing New Tests
## Test Plan: [function/module name]
**Behavior:** [one sentence]
**Type:** Unit | Integration | E2E
### Cases
1. [happy path]
2. [edge case 1]
3. [edge case 2]
4. [error condition]
When Reviewing Tests
## Test Review: [file/module]
### Validation
| Criterion | Status |
| ----------------------- | ------ |
| Tests pass consistently | ✅/❌ |
| One behavior per test | ✅/❌ |
| Names describe behavior | ✅/❌ |
| Tests are independent | ✅/❌ |
| Critical paths covered | ✅/❌ |
| Assertions are specific | ✅/❌ |
**Verdict:** GOOD | NEEDS_WORK
### Issues (if NEEDS_WORK)
- [Issue]: [Location] → [Fix]
Quick Reference
Test Pyramid
/\ E2E (5%) - Critical journeys only
/--\ Integration (25%) - Module boundaries
/----\ Unit (70%) - Fast, isolated, logic
Flaky Test Causes
| Cause | Solution |
|---|---|
| Timing assumptions | Callbacks/promises, not sleep |
| Shared state | Isolate setup per test |
| Random values | Mock randomness, seed generators |
| Async race | Proper awaits, controlled order |
Final Principle
GOOD with good-enough tests > Blocked by perfect tests
Tests that exist and pass > tests planned but not written.