| name | software-testing-strategy |
| description | Strategic testing framework covering the testing pyramid, test design patterns, and testing best practices from industry leaders - complements TDD workflow with comprehensive strategy. |
| tags | testing, quality, strategy, test-design, testing-pyramid |
| version | 1.0.0 |
Software Testing Strategy
Purpose
This skill provides comprehensive strategic guidance for designing effective test suites. It covers:
- The Testing Pyramid: Economic justification for the 70/20/10 distribution
- Test Design Patterns: AAA, Test Builders, Test Doubles, Property-Based Testing
- Testing by Level: When to use unit, integration, e2e, and property-based tests
- Anti-Patterns: How to recognize and fix common testing mistakes
- Legacy Code Testing: Practical techniques from Michael Feathers
- Legendary Wisdom: Testing principles from Kent Beck, Martin Fowler, and others
This skill is strategic (what to test, where to test it, how to structure it) and complements the tdd-enforcement skill which is tactical (test-first workflow execution).
Use this skill when planning testing approaches, choosing test types, designing test patterns, or expanding test coverage. Use tdd-enforcement when actively writing code test-first.
When to Use This Skill
Use this skill when you need to:
- Design a testing strategy for a new feature or system
- Choose which test types to write (unit, integration, e2e)
- Understand test design patterns (AAA, Test Builders, Test Doubles)
- Recognize and fix test anti-patterns (flaky tests, slow tests, over-mocking)
- Test legacy code without existing test coverage
- Expand test coverage using risk-based prioritization
- Evaluate test quality in code reviews
When NOT to Use This Skill
Do NOT use this skill when:
- Actively writing code test-first (use
tdd-enforcementinstead) - Writing simple one-off tests (just write them)
- The testing approach is already clear and straightforward
The Iron Law of Testing
These principles are non-negotiable for effective testing:
1. Tests Must Provide Fast Feedback
Slow tests don't get run. Unit tests should complete in milliseconds, integration tests in seconds, full e2e suite under 30 minutes.
2. Tests Must Be Deterministic
Flaky tests destroy trust. No random inputs, no real clocks, no network timing dependencies. Every run must produce identical results.
3. Test Behavior, Not Implementation
Tests should verify what the code does, not how it does it. Implementation details change; behavior contracts don't.
4. Test at the Appropriate Level
Don't use e2e tests for business logic. Don't mock everything in unit tests. Each test type has a purpose.
5. Tests Are Production Code
Apply the same quality standards: readability, maintainability, simplicity. Test code lives longer than production code.
6. Risk-Based Coverage Beats Percentage Coverage
100% coverage means nothing if you're asserting the wrong things. Focus on critical paths, edge cases, and high-risk areas.
Testing Philosophy: First Principles
Each CLAUDE.md principle applies directly to testing:
Clarity Over Cleverness
In Testing: Tests are executable documentation. A test should read like a specification:
def test_withdrawing_more_than_balance_raises_insufficient_funds_error():
account = Account(balance=100)
with pytest.raises(InsufficientFundsError):
account.withdraw(150)
Not: test_withdraw_2() with complex setup and unclear assertions.
Strong Boundaries, Loose Coupling
In Testing: Test isolation. Each test should be completely independent with its own setup and teardown. Tests that depend on each other create fragile suites.
# Good: Isolated test with complete setup
RSpec.describe Order do
it "calculates total with tax" do
order = Order.new(items: [Item.new(price: 100)])
expect(order.total_with_tax(rate: 0.08)).to eq(108)
end
end
# Bad: Depends on previous test state
# it "applies discount after tax" do
# expect(@order.with_discount(0.1).total_with_tax).to eq(97.2)
# end
Fail Fast, Fail Loud
In Testing: Fast feedback from failures. Tests should run quickly and fail clearly with actionable error messages.
// Good: Clear assertion with context
expect(user.age).toBe(21,
`User ${user.name} must be 21 to purchase alcohol, got ${user.age}`);
// Bad: Silent failure or unclear message
// assert(user.age >= 21);
Simplicity Wins
In Testing: Simple test setup, clear assertions, no complex logic. If your test needs comments to explain what it's testing, simplify it.
# Good: Simple, clear test
def test_empty_cart_has_zero_total():
cart = Cart()
assert cart.total() == 0
# Bad: Complex test with logic
# def test_cart_totals():
# for i in range(10):
# if i % 2 == 0:
# cart.add(Item(price=i * 10))
# assert cart.total() == sum([i * 10 for i in range(10) if i % 2 == 0])
Design for Change
In Testing: Tests should enable refactoring, not prevent it. Test behaviors through public APIs, not internal implementation details.
// Good: Tests behavior through public API
test('login redirects to dashboard on success', async () => {
await loginPage.submitCredentials('user', 'pass');
expect(browser.url()).toContain('/dashboard');
});
// Bad: Tests internal implementation
// test('login sets auth token in localStorage', async () => {
// await loginPage.submitCredentials('user', 'pass');
// expect(localStorage.getItem('authToken')).toBeTruthy();
// });
Test at the Right Levels
In Testing: This is the Testing Pyramid principle. Unit tests for correctness, integration tests for contracts, e2e tests for critical journeys.
Architecture Enables Testing at Right Levels: Use Functional Core, Imperative Shell to maximize testable surface area. Pure business logic in core = more unit tests. Side effects in shell = fewer integration tests. See "Architecture for Testability" section below.
Operational Excellence is a Feature
In Testing: Test observability matters. Clear test names, structured output, actionable failures, execution time tracking.
Architecture for Testability
Before choosing test types and levels, understand how architecture enables or impedes testing.
Functional Core, Imperative Shell Pattern
Problem (Google Testing Blog, October 2025):
"Mixing database calls, network requests, and other external interactions directly with your core logic can lead to code that's difficult to test."
Solution:
Separate pure business logic (functional core) from side effects (imperative shell).
Why This Matters for Testing:
- Unit Testing: Core becomes trivially testable without mocks
- Testing Pyramid Economics: More logic in core = more fast, cheap unit tests
- Integration Testing: Shell needs fewer, lighter integration tests
- Test Determinism: Pure functions are inherently deterministic
Example: Before FCIS (Hard to Test)
# Mixed concerns: logic tangled with database and email
def send_expiry_reminders():
users = UserRepository.find_all() # Database call
for user in users:
# Business logic mixed with I/O
if user.expires_at <= Date.today() + timedelta(days=7) and not user.reminded:
EmailService.send(
to=user.email,
subject="Account Expiry Reminder",
body=f"Your account expires on {user.expires_at}"
)
UserRepository.update(user.id, reminded=True)
Problems:
- Can't test expiry logic without database
- Can't test email content without email service
- Requires heavy mocking for tests
- Business rules buried in I/O operations
Example: After FCIS (Easy to Test)
# Core: Pure functions (easy to test)
def users_needing_reminder(users, cutoff_date):
"""Filter users who need expiry reminders."""
return [u for u in users if u.expires_at <= cutoff_date and not u.reminded]
def generate_expiry_emails(users):
"""Generate email content for users."""
return [
{
'to': user.email,
'subject': 'Account Expiry Reminder',
'body': f'Your account expires on {user.expires_at.strftime("%Y-%m-%d")}'
}
for user in users
]
# Shell: Thin orchestration (tested lightly)
def send_expiry_reminders():
users = UserRepository.find_all()
to_remind = users_needing_reminder(users, Date.today() + timedelta(days=7))
emails = generate_expiry_emails(to_remind)
EmailService.send_bulk(emails)
UserRepository.mark_reminded([u.id for u in to_remind])
Testing Benefits:
# Core: Fast unit tests (no mocks needed!)
def test_filters_users_expiring_within_cutoff():
users = [
User(email='a@ex.com', expires_at=Date.today() + timedelta(days=5), reminded=False),
User(email='b@ex.com', expires_at=Date.today() + timedelta(days=10), reminded=False),
User(email='c@ex.com', expires_at=Date.today() + timedelta(days=5), reminded=True)
]
result = users_needing_reminder(users, Date.today() + timedelta(days=7))
assert len(result) == 1
assert result[0].email == 'a@ex.com'
# Runs in milliseconds, no I/O needed
Key Benefits:
- Core: Test extensively with fast unit tests (no I/O dependencies)
- Shell: Test lightly with integration tests (mostly orchestration)
- Reusability: Pure functions compose easily across features
See the writing-code skill for complete implementation guidance on separating decisions from effects (Functional Core, Imperative Shell pattern).
The Testing Pyramid: Economics and Strategy
/\
/ \ E2E Tests (10%)
/ \
/ \ - Slow (minutes)
/--------\ - Expensive ($$$)
/ \ - Critical journeys only
/ Integration\ (20%)
/ Tests \
/ \ - Moderate speed (seconds)
/ \ - Contract verification
/-------------------\
| Unit Tests | (70%)
| - Fast (<100ms)
| - Cheap ($)
| - Many tests
+---------------------+
The 70/20/10 Distribution
70% Unit Tests:
- Test business logic, algorithms, validations
- No I/O, no network, no database
- Each test completes in milliseconds
- Hundreds or thousands of tests
20% Integration Tests:
- Test component interactions
- Verify contracts between modules
- Real dependencies (database, message queue)
- Each test completes in seconds
- Dozens to hundreds of tests
10% E2E Tests:
- Test critical user journeys
- Full stack: frontend → backend → database
- Each test completes in seconds to minutes
- Keep suite under 30 minutes total
- Top 20% of journeys = 80% of business value
Economic Justification
Testing Pyramid Approach (70/20/10):
- CI/CD infrastructure: ~$100/month
- Fast feedback: 5-10 minutes for full suite
- Developer velocity: High (tests run frequently)
- Maintenance: Low (unit tests rarely break)
Inverted Pyramid (Heavy E2E):
- CI/CD infrastructure: ~$10,000/month
- Slow feedback: 2-4 hours for full suite
- Developer velocity: Low (developers skip tests)
- Maintenance: High (e2e tests brittle)
Source: Industry research (TestRail, FullScale 2025)
2025 Adaptations
Microservices Architecture: Adjust to 65/25/10
- More integration tests for service contracts
- Contract testing (Pact, Spring Cloud Contract)
- Still maintain fast unit test majority
Cloud-Native Systems: Preview Environments
- Deploy PR branches to temporary environments
- Run e2e tests against preview before merge
- Faster feedback than full production-like environment
Focus on Value: 20% of Journeys = 80% of Business Value
- Identify critical paths: authentication, checkout, data submission
- E2E test only these high-value journeys
- Don't test every edge case at e2e level
Legendary Testing Wisdom
Kent Beck (TDD Pioneer)
"Test-first is about design, not testing."
- Writing tests first forces you to design clear interfaces
- If it's hard to test, the design needs improvement
- Red-Green-Refactor cycle: fail → pass → improve
Tests as Executable Specifications:
# Test describes what the code should do
def test_account_prevents_overdraft():
"""Account.withdraw() should raise InsufficientFundsError
when withdrawal amount exceeds balance."""
account = Account(balance=100)
with pytest.raises(InsufficientFundsError):
account.withdraw(150)
Martin Fowler (Testing Patterns)
"Tests should be FIRST: Fast, Isolated, Repeatable, Self-validating, Timely."
- Fast: Unit tests in milliseconds
- Isolated: No shared state between tests
- Repeatable: Same result every time
- Self-validating: Pass/fail, no manual inspection
- Timely: Written with (or before) production code
Test Doubles Taxonomy:
- Mock: Verifies interactions (assert method was called)
- Stub: Returns predetermined values
- Fake: Working implementation (in-memory database)
- Spy: Records calls for later inspection
Testing Pyramid Principle:
- Unit tests form the base (many, fast, cheap)
- Integration tests in the middle (fewer, slower)
- E2E tests at the top (fewest, slowest, expensive)
Michael Feathers (Legacy Code)
"Legacy code is code without tests."
Characterization Tests
When you don't know what the code should do, write tests that capture what it currently does:
# Characterization test for legacy code
RSpec.describe LegacyPriceCalculator do
it "calculates price with current behavior" do
calculator = LegacyPriceCalculator.new
# Document current behavior, even if unclear
expect(calculator.calculate(quantity: 5, item_code: "A")).to eq(47.50)
expect(calculator.calculate(quantity: 5, item_code: "B")).to eq(50.00)
end
end
Finding Seams: Identify injection points for tests in untestable code.
Cover and Modify: Add characterization tests, then refactor safely.
Rich Hickey (Simplicity)
"Simplicity is not easy, but it's essential."
In Testing (Simplicity)
- Use simple, immutable data structures in tests
- Avoid test complexity (if/else, loops in tests)
- Pure functions are trivially testable
;; Simple test with immutable data
(deftest test-cart-total
(let [cart {:items [{:price 10} {:price 20}]}]
(is (= 30 (calculate-total cart)))))
John Carmack (Performance)
"Measure, don't guess."
In Testing (Performance)
- Benchmark critical paths
- Performance regression tests
- Measure test execution time
import pytest
@pytest.mark.benchmark
def test_search_performance(benchmark):
large_dataset = generate_dataset(10000)
result = benchmark(search_function, large_dataset, "query")
assert result is not None
assert benchmark.stats.mean < 0.1 # Must complete in <100ms
Test Design Patterns
AAA Pattern (Arrange-Act-Assert)
The fundamental structure for readable tests:
def test_user_registration_sends_welcome_email():
# Arrange: Set up test data and dependencies
email_service = FakeEmailService()
user_service = UserService(email_service)
user_data = {"email": "user@example.com", "name": "Alice"}
# Act: Execute the behavior being tested
user = user_service.register(user_data)
# Assert: Verify the expected outcome
assert user.id is not None
assert email_service.sent_emails[0].recipient == "user@example.com"
assert "Welcome" in email_service.sent_emails[0].subject
Benefits:
- Clear structure: setup → action → verification
- Easy to read and understand
- Separates concerns within the test
Data Flow and Readability (Google Testing Blog, January 2025):
"Order your lines of code to match the data flow inside your method"
The AAA pattern naturally follows data flow:
- Arrange: Create data and dependencies
- Act: Data flows through the system under test
- Assert: Verify data transformations
Anti-Pattern: Jumbled Test
# Bad: Unclear flow, setup mixed with assertions
def test_order_total_with_discount():
assert order.total == 90 # What order?
order = Order(items=[Item(price=100)])
order.apply_discount(0.10) # Discount applied after assertion?
Good Pattern: Clear Data Flow
# Good: Data flows clearly through Arrange → Act → Assert
def test_order_total_with_discount():
# Arrange: Create data
order = Order(items=[Item(price=100)])
# Act: Transform data
order.apply_discount(0.10)
# Assert: Verify transformation
assert order.total == 90
Lines ordered to match data dependencies reduce cognitive load and improve test maintainability.
Test Builder Pattern
For complex object creation in tests:
class UserBuilder {
private name = "Test User";
private email = "test@example.com";
private age = 25;
private roles: string[] = [];
withName(name: string): UserBuilder {
this.name = name;
return this;
}
withEmail(email: string): UserBuilder {
this.email = email;
return this;
}
withAge(age: number): UserBuilder {
this.age = age;
return this;
}
withRoles(...roles: string[]): UserBuilder {
this.roles = roles;
return this;
}
build(): User {
return new User(this.name, this.email, this.age, this.roles);
}
}
// Usage in tests
test('admin users can delete posts', () => {
const admin = new UserBuilder()
.withRoles('admin')
.build();
expect(admin.canDelete(post)).toBe(true);
});
test('underage users cannot purchase alcohol', () => {
const minor = new UserBuilder()
.withAge(17)
.build();
expect(minor.canPurchaseAlcohol()).toBe(false);
});
Benefits:
- Readable test setup with fluent interface
- Default values for unimportant fields
- Reusable across tests
Test Doubles (Fowler Taxonomy)
Mock: Verifies Interactions
Use when you need to verify a method was called:
def test_successful_order_sends_confirmation_email():
email_service_mock = Mock()
order_service = OrderService(email_service_mock)
order_service.place_order(customer_id=123, items=[{"sku": "ABC"}])
email_service_mock.send_email.assert_called_once_with(
to="customer@example.com",
subject="Order Confirmation"
)
Stub: Returns Predetermined Values
Use when you need to control dependencies' return values:
RSpec.describe PaymentProcessor do
it "retries on temporary payment gateway failure" do
gateway_stub = double("PaymentGateway")
allow(gateway_stub).to receive(:charge)
.and_return(
{ success: false, error: "Timeout" }, # First call fails
{ success: true, transaction_id: "123" } # Second call succeeds
)
processor = PaymentProcessor.new(gateway_stub)
result = processor.process_payment(amount: 100)
expect(result.success).to be true
expect(gateway_stub).to have_received(:charge).twice
end
end
Fake: Working Implementation
Use for complex dependencies like databases:
class FakeUserRepository {
constructor() {
this.users = new Map();
this.nextId = 1;
}
save(user) {
const id = this.nextId++;
this.users.set(id, { ...user, id });
return { ...user, id };
}
findById(id) {
return this.users.get(id) || null;
}
findByEmail(email) {
return Array.from(this.users.values())
.find(u => u.email === email) || null;
}
}
// Usage in tests
test('user registration prevents duplicate emails', async () => {
const repo = new FakeUserRepository();
const service = new UserService(repo);
await service.register({ email: 'user@example.com', name: 'Alice' });
await expect(
service.register({ email: 'user@example.com', name: 'Bob' })
).rejects.toThrow('Email already registered');
});
Spy: Records Calls
Use when you need to verify calls after the fact:
class EmailServiceSpy:
def __init__(self):
self.sent_emails = []
def send_email(self, to, subject, body):
self.sent_emails.append({
'to': to,
'subject': subject,
'body': body
})
def test_order_confirmation_email_contains_order_details():
email_spy = EmailServiceSpy()
order_service = OrderService(email_spy)
order = order_service.place_order(
customer_id=123,
items=[{"sku": "ABC", "quantity": 2}]
)
assert len(email_spy.sent_emails) == 1
email = email_spy.sent_emails[0]
assert email['to'] == order.customer_email
assert "ABC" in email['body']
assert "quantity: 2" in email['body']
Test Double Selection Guide
Decision Tree:
- Do you need to verify a method was called? → Use Mock
- Do you need to control return values? → Use Stub
- Is the dependency complex (database, file system)? → Use Fake
- Do you need to inspect calls after execution? → Use Spy
Rule of Thumb: Mock external systems, not your own components. Test state, not interactions (Google standard).
Parameterized/Table-Driven Tests
Reduce duplication for multiple inputs with same logic:
import pytest
@pytest.mark.parametrize("input_text,expected_slug", [
("Hello World", "hello-world"),
("Hello World", "hello-world"), # Multiple spaces
("HELLO WORLD", "hello-world"), # Uppercase
("Hello, World!", "hello-world"), # Punctuation
("Café au Lait", "cafe-au-lait"), # Accents
(" Hello World ", "hello-world"), # Leading/trailing spaces
])
def test_slugify(input_text, expected_slug):
assert slugify(input_text) == expected_slug
Benefits:
- Clear table of inputs and expected outputs
- Easy to add new test cases
- Reduces code duplication
Test Organization and Readability
Sort Test Cases (Google Testing Blog, September 2025):
"Sorted lists help prevent bugs through improved readability and consistency"
Alphabetically sorted parameterized test cases make duplicates and conflicts immediately visible.
Example: Sorted Parameterized Tests
@pytest.mark.parametrize("input_text,expected_slug", [
(" Hello World ", "hello-world"), # Leading/trailing spaces (sorted)
("Café au Lait", "cafe-au-lait"), # Accents
("HELLO WORLD", "hello-world"), # Uppercase
("Hello World", "hello-world"), # Basic case
("Hello World", "hello-world"), # Multiple spaces
("Hello, World!", "hello-world"), # Punctuation
])
def test_slugify(input_text, expected_slug):
assert slugify(input_text) == expected_slug
Benefits of Sorting:
- Duplicates stand out immediately
- Easier to find specific test cases
- Consistent organization across test suite
- Reduces merge conflicts in version control
Warning: Only sort when order doesn't matter. Don't sort if test execution order is intentional (e.g., dependency loading).
Data Flow in Tests (Google Testing Blog, January 2025):
Tests should flow logically through setup, execution, and verification:
# Good: Clear flow from setup to assertion
def test_user_receives_welcome_email_after_registration():
# Setup: Create dependencies
email_service = FakeEmailService()
user_service = UserService(email_service)
# Execute: Perform action
user = user_service.register(email="new@example.com", name="Alice")
# Verify: Check outcomes
assert len(email_service.sent_emails) == 1
assert email_service.sent_emails[0]['to'] == "new@example.com"
assert "Welcome" in email_service.sent_emails[0]['subject']
Anti-Pattern: Jumbled Setup
# Bad: Setup scattered, unclear dependencies
def test_user_receives_welcome_email():
user = user_service.register(email="new@example.com", name="Alice") # Where does user_service come from?
email_service = FakeEmailService() # Created after use?
assert len(email_service.sent_emails) == 1 # How can this work?
Keep setup together, execution clear, and assertions at the end. Lines should be ordered to match data dependencies.
Testing by Level
Unit Testing
Characteristics:
- Speed: <100ms per test
- Isolation: No I/O, no network, no database
- Scope: Single function, class, or module
- Deterministic: Same inputs always produce same outputs
- Quantity: Hundreds to thousands
What to Test:
- Business logic and algorithms
- Input validation and edge cases
- Error handling and exceptions
- Data transformations
- Calculations and computations
Google Standard: "Test via public APIs"
Don't test private methods directly. Test behaviors through public interfaces.
# Good: Tests behavior through public API
def test_account_applies_interest():
account = SavingsAccount(balance=1000, interest_rate=0.05)
account.apply_monthly_interest()
assert account.balance == 1004.17 # 1000 * (1.05^(1/12))
# Bad: Tests private implementation detail
# def test_calculate_monthly_interest_rate():
# account = SavingsAccount(balance=1000, interest_rate=0.05)
# assert account._calculate_monthly_rate() == 0.004074
Functional Core, Imperative Shell for Unit Testing
Architectural Pattern for Maximizing Unit Test Coverage:
When business logic is tangled with I/O, you're forced to write slow integration tests. FCIS pattern maximizes fast unit test coverage:
Anti-Pattern (Requires Integration Test):
def send_expiry_reminders():
users = UserRepository.find_all() # Database call
for user in users:
if user.expires_at <= Date.today() + timedelta(days=7): # Logic mixed with I/O
EmailService.send(...) # Network call
Can't test the expiry logic without database and email service.
FCIS Pattern (Pure Unit Tests):
# Core: Pure, fast unit tests
def users_needing_reminder(users, cutoff_date):
return [u for u in users if u.expires_at <= cutoff_date and not u.reminded]
# Test the core (no mocks needed!)
def test_filters_users_expiring_before_cutoff():
users = [
User(expires_at=Date.today() + timedelta(days=5), reminded=False),
User(expires_at=Date.today() + timedelta(days=10), reminded=False)
]
result = users_needing_reminder(users, Date.today() + timedelta(days=7))
assert len(result) == 1
Testing Economics:
- Before FCIS: Integration test (seconds, database required)
- After FCIS: Unit test (milliseconds, pure function)
- Impact: Run thousands of core tests in time of one integration test
See writing-code skill for complete pattern guidance on separating decisions from effects.
Example: Testing Edge Cases
describe('divideNumbers', () => {
it('divides positive numbers', () => {
expect(divideNumbers(10, 2)).toBe(5);
});
it('divides negative numbers', () => {
expect(divideNumbers(-10, 2)).toBe(-5);
});
it('throws error when dividing by zero', () => {
expect(() => divideNumbers(10, 0)).toThrow('Cannot divide by zero');
});
it('handles floating point division', () => {
expect(divideNumbers(10, 3)).toBeCloseTo(3.333, 2);
});
});
Integration Testing
Characteristics:
- Speed: Seconds per test
- Isolation: Real dependencies (database, message queue, external services)
- Scope: Multiple components working together
- Setup: Requires test database, docker containers, or service mocks
- Quantity: Dozens to hundreds
What to Test:
- Component interactions and contracts
- Database queries and transactions
- API endpoints (request → response)
- Message queue publishers/consumers
- External service integrations
Example: API Integration Test
RSpec.describe "POST /api/orders" do
it "creates order and returns 201 with order details" do
customer = Customer.create!(email: "customer@example.com")
product = Product.create!(sku: "ABC", price: 25.00)
post "/api/orders", params: {
customer_id: customer.id,
items: [
{ sku: "ABC", quantity: 2 }
]
}
expect(response).to have_http_status(201)
expect(json_response['total']).to eq(50.00)
expect(Order.count).to eq(1)
expect(Order.last.customer_id).to eq(customer.id)
end
end
Example: Database Integration Test
describe('UserRepository', () => {
beforeEach(async () => {
await database.migrate.latest();
});
afterEach(async () => {
await database.migrate.rollback();
});
it('finds users by email with case insensitivity', async () => {
const repo = new UserRepository(database);
await repo.save({ email: 'Alice@Example.com', name: 'Alice' });
const user = await repo.findByEmail('alice@example.com');
expect(user).not.toBeNull();
expect(user.name).toBe('Alice');
});
});
E2E Testing
Characteristics:
- Speed: Seconds to minutes per test
- Isolation: Full stack (browser → backend → database)
- Scope: Complete user journeys
- Setup: Running application, test database, browser automation
- Quantity: Dozens (keep suite under 30 minutes)
What to Test:
- Critical user journeys (top 20% = 80% of business value)
- Authentication and authorization flows
- Checkout and payment processes
- Data submission workflows
- Cross-browser compatibility (if needed)
2025 Best Practice: Run e2e tests in parallel, keep total suite under 30 minutes.
Example: E2E User Journey
def test_user_completes_checkout_journey(browser):
# Navigate to product page
browser.visit("https://shop.example.com/products/laptop")
browser.click("Add to Cart")
# View cart
browser.click("Cart")
assert browser.find("Laptop").is_visible()
assert browser.find("$999.99").is_visible()
# Checkout
browser.click("Checkout")
browser.fill("email", "customer@example.com")
browser.fill("card_number", "4242424242424242")
browser.fill("expiry", "12/25")
browser.fill("cvc", "123")
browser.click("Place Order")
# Confirmation
assert browser.find("Order Confirmed").is_visible()
assert browser.find("#123456").is_visible() # Order number
Anti-Pattern: Testing business logic at e2e level
// Bad: Don't test edge cases in e2e tests
test('empty cart shows correct message', async () => {
await page.goto('/cart');
expect(await page.textContent('.cart-message')).toBe('Your cart is empty');
});
// Good: Test this at unit level instead
test('Cart.isEmpty returns true when no items', () => {
const cart = new Cart();
expect(cart.isEmpty()).toBe(true);
});
Property-Based Testing
When to Use:
- Input validation with many edge cases
- Parser correctness
- Mathematical properties (commutativity, associativity)
- Serialization/deserialization roundtrips
Example Using Hypothesis (Python):
from hypothesis import given
from hypothesis.strategies import text
@given(text())
def test_slugify_roundtrip_property(input_text):
"""Slugifying twice should produce the same result as slugifying once."""
assert slugify(slugify(input_text)) == slugify(input_text)
@given(text(), text())
def test_slugify_concatenation(text1, text2):
"""Slugifying concatenated strings should match concatenating slugs."""
combined = slugify(text1 + " " + text2)
separate = slugify(text1) + "-" + slugify(text2)
assert combined == separate
Benefits: Discovers edge cases you didn't think to test manually.
Anti-Patterns and Code Smells
1. Flaky Tests (Non-Deterministic)
Symptom: Tests pass sometimes, fail other times with no code changes.
Causes:
- Real system clocks
- Random number generators without seeds
- Network timing dependencies
- Shared state between tests
- Asynchronous code without proper waits
Fix:
# Bad: Real clock makes test non-deterministic
def test_token_expires_after_one_hour():
token = create_token()
time.sleep(3601) # Wait 1 hour + 1 second
assert is_expired(token) # May fail due to timing
# Good: Control time with test doubles
def test_token_expires_after_one_hour():
fake_clock = FakeClock(now=datetime(2025, 1, 1, 12, 0, 0))
token = create_token(clock=fake_clock)
fake_clock.advance(hours=1, seconds=1)
assert is_expired(token, clock=fake_clock)
2. Slow Tests (Wrong Level)
Symptom: Unit tests taking seconds instead of milliseconds.
Causes:
- Database queries in unit tests
- Network calls in unit tests
- File I/O in unit tests
- Testing at wrong level (e2e test for business logic)
Fix:
# Bad: Unit test with database call (slow)
RSpec.describe Order do
it "calculates total" do
order = Order.create!(customer_id: 123)
order.items.create!(sku: "ABC", price: 25, quantity: 2)
expect(order.total).to eq(50)
end
end
# Good: Pure unit test (fast)
RSpec.describe Order do
it "calculates total" do
order = Order.new
order.items = [
OrderItem.new(price: 25, quantity: 2)
]
expect(order.total).to eq(50)
end
end
3. Test Interdependencies
Symptom: Tests pass when run in one order, fail when run in different order.
Causes:
- Shared mutable state
- Tests depending on previous test setup
- Database state not cleaned between tests
- Global variables
Fix:
// Bad: Tests depend on shared state
let user;
beforeAll(() => {
user = createUser({ email: 'test@example.com' });
});
test('user can login', () => {
expect(login(user)).toBe(true);
user.loginCount++; // Mutates shared state
});
test('new users have zero logins', () => {
expect(user.loginCount).toBe(0); // Fails: depends on previous test
});
// Good: Each test isolated with own setup
test('user can login', () => {
const user = createUser({ email: 'test@example.com' });
expect(login(user)).toBe(true);
});
test('new users have zero logins', () => {
const user = createUser({ email: 'test@example.com' });
expect(user.loginCount).toBe(0);
});
4. Over-Mocking (Testing Implementation)
Symptom: Tests break when refactoring internal implementation, even though behavior unchanged.
Rule: Mock external systems, not your own components. Test state, not interactions.
Fix:
# Bad: Over-mocking internal components
def test_place_order():
inventory_mock = Mock()
payment_mock = Mock()
email_mock = Mock()
order_service = OrderService(inventory_mock, payment_mock, email_mock)
order_service.place_order(customer_id=123, items=[{"sku": "ABC"}])
inventory_mock.check_availability.assert_called_once()
inventory_mock.reserve_items.assert_called_once()
payment_mock.charge.assert_called_once()
email_mock.send_confirmation.assert_called_once()
# Tests implementation details, brittle
# Good: Test behavior with minimal mocking
def test_place_order():
fake_inventory = FakeInventory(items={"ABC": 10})
fake_payment = FakePaymentGateway()
email_spy = EmailServiceSpy()
order_service = OrderService(fake_inventory, fake_payment, email_spy)
order = order_service.place_order(
customer_id=123,
items=[{"sku": "ABC", "quantity": 2}]
)
# Test state and critical behavior
assert order.status == "confirmed"
assert fake_inventory.available("ABC") == 8
assert len(email_spy.sent_emails) == 1
5. Logic in Tests
Symptom: Tests contain conditionals, loops, or complex calculations.
Problem: Who tests the tests? Test logic can have bugs.
Fix:
// Bad: Logic in test
test('all users can view public posts', () => {
const users = [admin, moderator, guest];
for (const user of users) {
if (user.role !== 'banned') {
expect(user.canView(publicPost)).toBe(true);
}
}
});
// Good: Simple, explicit tests (or parameterized)
test('admin can view public posts', () => {
expect(admin.canView(publicPost)).toBe(true);
});
test('moderator can view public posts', () => {
expect(moderator.canView(publicPost)).toBe(true);
});
test('guest can view public posts', () => {
expect(guest.canView(publicPost)).toBe(true);
});
6. Unclear Test Names
Symptom: Test name like test_user_2 or test_edge_case.
Problem: When test fails, unclear what broke.
Fix:
# Bad: Unclear names
def test_withdraw_1
# ...
end
def test_withdraw_2
# ...
end
# Good: Behavioral names
def test_withdraw_deducts_amount_from_balance
# ...
end
def test_withdraw_raises_error_when_insufficient_funds
# ...
end
Test Quality Checklist
Beyond the TDD enforcement checklist, evaluate tests for:
Maintainability
- Test names describe behavior, not implementation
- Tests are isolated (no shared state)
- Setup is clear and minimal
- Assertions are simple and focused
- Test data is meaningful (not
foo,bar,test123)
Risk-Based Coverage
- Critical paths have multiple test cases
- Edge cases are covered
- Error paths are tested
- High-risk areas (security, payments) have thorough coverage
- Low-risk areas (UI text) have minimal coverage
Test Code Quality
- No duplication (use builders, factories, helpers)
- No magic numbers (use named constants)
- No complex logic (conditionals, loops)
- Test doubles are appropriate (mock external systems only)
- Tests run fast at appropriate level
Behavioral Naming
# Good naming pattern: test_<scenario>_<expected_behavior>
test_withdrawing_more_than_balance_raises_error()
test_valid_coupon_code_applies_discount()
test_expired_coupon_code_returns_error()
Clear Assertions
// Bad: Unclear assertion
expect(result).toBe(true);
// Good: Clear assertion with context
expect(user.canAccessAdminPanel()).toBe(true);
expect(order.status).toBe('completed');
Testing Legacy Code
Characterization Tests (Michael Feathers)
When dealing with code without tests, start by documenting current behavior:
def test_legacy_price_calculator_current_behavior():
"""
Characterization test for legacy calculator.
This documents CURRENT behavior, which may not be correct.
Once we understand it, we can refactor safely.
"""
calculator = LegacyPriceCalculator()
# Document what the code currently does
assert calculator.calculate(quantity=1, item="A") == 10.00
assert calculator.calculate(quantity=5, item="A") == 45.00 # Bulk discount?
assert calculator.calculate(quantity=1, item="B") == 15.00
assert calculator.calculate(quantity=1, item="X") == 0.00 # Returns 0 for unknown?
Finding Seams
Seam: A place where you can alter behavior without editing the code.
Example: Dependency Injection Seam
# Legacy code (hard to test)
class OrderProcessor
def process(order)
gateway = PaymentGateway.new(api_key: ENV['PAYMENT_KEY'])
result = gateway.charge(order.amount)
if result.success?
order.mark_paid
end
end
end
# Add seam with dependency injection (backward compatible)
class OrderProcessor
def initialize(gateway: nil)
@gateway = gateway || PaymentGateway.new(api_key: ENV['PAYMENT_KEY'])
end
def process(order)
result = @gateway.charge(order.amount)
if result.success?
order.mark_paid
end
end
end
# Now testable
RSpec.describe OrderProcessor do
it "marks order as paid when payment succeeds" do
fake_gateway = FakePaymentGateway.new(success: true)
processor = OrderProcessor.new(gateway: fake_gateway)
order = Order.new(amount: 100)
processor.process(order)
expect(order.paid?).to be true
end
end
Cover and Modify
Strategy for legacy code:
- Write characterization tests to capture current behavior
- Verify tests fail when you break the code (tests actually test something)
- Refactor safely with tests protecting against regressions
- Improve tests as you understand intended behavior
- Add new tests for new features
Approval Testing Pattern
For complex outputs (JSON, HTML, reports):
def test_invoice_generation_approval():
"""Approval test: Captures full output for human review."""
order = Order(
id=123,
customer="Alice",
items=[Item(sku="ABC", price=25, quantity=2)]
)
invoice_html = generate_invoice(order)
# First run: approve_file creates approved/invoice.html
# Subsequent runs: compare against approved version
approval.verify(invoice_html, name="invoice")
Benefits: Tests complex output without writing assertions for every detail.
Test Maintenance and Refactoring
When to Refactor Tests
Refactor tests when:
- Tests are duplicated (extract shared setup to builders/factories)
- Tests are fragile (break with irrelevant changes)
- Tests are unclear (rename for clarity, simplify setup)
- Tests are slow at wrong level (move to unit tests)
Keeping Tests Valuable as Code Evolves
Tests Enable Refactoring:
Good tests allow you to refactor production code confidently. If refactoring breaks tests that shouldn't break, tests are too coupled to implementation.
Test Behavior, Not Implementation:
// Bad: Tests implementation (breaks when refactoring)
test('UserService.findById calls repository.query with SELECT statement', () => {
const repo = mock(UserRepository);
const service = new UserService(repo);
service.findById(123);
expect(repo.query).toHaveBeenCalledWith('SELECT * FROM users WHERE id = ?', [123]);
});
// Good: Tests behavior (survives refactoring)
test('UserService.findById returns user with matching ID', async () => {
const repo = new FakeUserRepository();
await repo.save({ id: 123, name: 'Alice' });
const service = new UserService(repo);
const user = await service.findById(123);
expect(user.name).toBe('Alice');
});
Test Code is Production Code
Apply same standards:
- Clear naming
- No duplication
- Simple structure
- Easy to understand
Removing Obsolete Tests
Delete tests that:
- Test removed features
- Duplicate other tests
- Provide no value (tautological assertions)
Test Data Builder Pattern
For maintainable test data:
class OrderBuilder:
def __init__(self):
self.customer_id = 1
self.items = []
self.status = "pending"
def for_customer(self, customer_id):
self.customer_id = customer_id
return self
def with_item(self, sku, quantity=1, price=10.0):
self.items.append({"sku": sku, "quantity": quantity, "price": price})
return self
def confirmed(self):
self.status = "confirmed"
return self
def build(self):
return Order(
customer_id=self.customer_id,
items=self.items,
status=self.status
)
# Usage
def test_confirmed_orders_charge_customer():
order = (OrderBuilder()
.for_customer(123)
.with_item("ABC", quantity=2, price=25.0)
.confirmed()
.build())
payment_service.charge(order)
assert payment_service.charged_amount == 50.0
Workflow: Designing a Test Strategy
Step-by-step process for new features:
1. Understand Requirements
What is the feature? What are acceptance criteria? What are edge cases?
2. Risk Assessment
Identify high-risk areas:
- Security boundaries (authentication, authorization)
- Financial calculations (payments, refunds)
- Data integrity (database writes, transactions)
- External integrations (payment gateways, email services)
3. Choose Test Levels (Bottom-Up)
Start with Unit Tests (70%):
- Business logic
- Calculations
- Input validation
- Edge cases
Add Integration Tests (20%):
- Database interactions
- API contracts
- External service integrations
Add E2E Tests (10%):
- Critical user journeys only
- Top 20% of user flows
4. Coverage Planning (Risk-Based)
Prioritize by risk, not by percentage:
- High-risk: Multiple test cases, thorough edge case coverage
- Medium-risk: Happy path + critical errors
- Low-risk: Smoke test only (if any)
5. Design Test Cases
For each test:
- Clear name describing behavior
- Arrange-Act-Assert structure
- Appropriate test doubles
- Simple, focused assertions
6. CI/CD Integration
- Unit tests run on every commit
- Integration tests run on PR creation
- E2E tests run before merge (or on preview environment)
- Keep feedback loop under 10 minutes
Example: E-Commerce Checkout Feature
Feature: User can add items to cart and complete checkout.
Risk Assessment:
- High-risk: Payment processing, order creation
- Medium-risk: Cart calculations, inventory checks
- Low-risk: UI text, button labels
Test Strategy:
Unit Tests (70%):
# test_cart.py
def test_empty_cart_has_zero_total()
def test_cart_sums_item_prices()
def test_cart_applies_quantity_discounts()
def test_cart_rejects_invalid_coupon_codes()
def test_cart_applies_valid_coupon_discounts()
# test_order.py
def test_order_calculates_tax()
def test_order_validates_shipping_address()
def test_order_prevents_negative_quantities()
Integration Tests (20%):
# test_checkout_api.py
def test_post_checkout_creates_order_in_database()
def test_post_checkout_charges_payment_gateway()
def test_post_checkout_returns_400_for_invalid_payment()
def test_post_checkout_rolls_back_on_payment_failure()
E2E Tests (10%):
# test_checkout_journey.py
def test_user_completes_successful_checkout()
def test_user_sees_error_for_declined_payment()
Common Mistakes to Avoid
Mistake 1: 100% Coverage as Goal
Why Bad: Coverage percentage is a vanity metric. You can have 100% coverage with meaningless assertions.
Why Good: Risk-based coverage focuses on critical paths. 80% coverage on high-value code beats 100% coverage on trivial code.
Mistake 2: Testing Implementation Details
Why Bad: Tests break when refactoring, even though behavior unchanged. Creates maintenance burden.
Why Good: Testing behaviors through public APIs enables safe refactoring. Tests survive implementation changes.
Mistake 3: Inverted Pyramid (Heavy E2E)
Why Bad: Slow feedback (hours), expensive infrastructure ($10k/month), brittle tests, high maintenance.
Why Good: Testing pyramid (70/20/10) provides fast feedback (minutes), low cost ($100/month), stable tests.
Mistake 4: No Test Isolation
Why Bad: Tests that depend on each other fail in unpredictable ways. Debugging is nightmare. Can't run tests in parallel.
Why Good: Isolated tests with complete setup can run in any order, in parallel, and failures are clear.
Mistake 5: Accepting Flaky Tests
Why Bad: Flaky tests destroy trust. Teams start ignoring test failures. Defeats entire purpose of testing.
Why Good: Zero tolerance for flaky tests. Fix immediately or delete. Deterministic tests provide reliable feedback.
Mistake 6: Writing Tests After Production Code
Why Bad: Code becomes hard to test. Tests are an afterthought. Untestable designs emerge.
Why Good: Test-first (TDD) forces testable design. Tests drive good architecture. Prevents untestable code.
Common Rationalizations
| Excuse | Reality | Remedy |
|---|---|---|
| "We don't have time to test" | Bugs in production cost 100x more to fix. Manual testing takes longer than automated tests. | Start with critical path unit tests. Add tests incrementally. |
| "Tests slow us down" | Bad tests slow you down. Good tests speed you up by catching regressions early. | Follow testing pyramid. Keep unit tests under 100ms. |
| "100% coverage means quality" | Coverage ≠ quality. Can have 100% coverage with meaningless assertions like expect(result).toBeTruthy(). |
Risk-based testing. Focus on critical paths and edge cases. |
| "We'll add tests later" | Later never comes. Technical debt compounds. Untestable code grows. | Use TDD. Write tests first. Make testability non-negotiable. |
| "Code is too complex to test" | Untestable code is poorly designed code. Complexity is design smell. | Refactor for testability. Extract dependencies. Break down complex functions. |
| "Integration tests are enough" | Integration tests are slow and don't catch all edge cases. Debugging integration test failures is hard. | Add fast unit tests for business logic and edge cases. |
| "Mocking is too hard" | Mocking internal components is hard (and wrong). Mocking external systems is straightforward. | Mock only external systems. Use fakes for complex dependencies. |
Quick Reference Tables
Table 1: Test Type Characteristics
| Test Type | Speed | Isolation | Coverage Scope | Cost | Feedback Loop | Quantity |
|---|---|---|---|---|---|---|
| Unit | <100ms | High (no I/O) | Single function/class | $ | Seconds | Hundreds/thousands |
| Integration | Seconds | Medium (real dependencies) | Multiple components | $$ | Minutes | Dozens/hundreds |
| E2E | Minutes | Low (full stack) | Complete user journey | $$$ | 30+ minutes | Dozens |
| Property-Based | <1s | High | Input space coverage | $ | Seconds | Hundreds of generated cases |
Table 2: Test Double Selection Guide
| Scenario | Test Double | Rationale | Example |
|---|---|---|---|
| Verify method was called | Mock | Need to assert on interactions | Email service sent notification |
| Control return values | Stub | Need predetermined responses | API returns success, then failure |
| Complex dependency | Fake | Need working in-memory implementation | In-memory database for tests |
| Inspect calls after execution | Spy | Need to verify calls and arguments | Logger recorded error messages |
Table 3: Anti-Pattern Recognition
| Anti-Pattern | Symptom | Root Cause | Fix | Priority |
|---|---|---|---|---|
| Flaky Tests | Passes sometimes, fails randomly | Real clock, network timing, shared state | Control time, isolate tests, use fakes | Critical |
| Slow Tests | Unit tests take seconds | I/O, database, network in unit tests | Remove dependencies, test pure logic | High |
| Test Interdependencies | Fails in different order | Shared mutable state | Isolate tests, complete setup per test | High |
| Over-Mocking | Breaks when refactoring internals | Mocking own components | Mock external systems only | Medium |
| Logic in Tests | Tests have if/loops | Complex test setup | Simplify tests, use parameterized tests | Medium |
| Unclear Names | Can't tell what broke | Generic test names | Behavioral naming: test_<scenario>_<behavior> |
Low |
Table 4: Test Level Selection Matrix
| Feature Characteristic | Unit | Integration | E2E | Rationale |
|---|---|---|---|---|
| Business logic | ✓ | - | - | Fast, isolated, many edge cases |
| Calculations | ✓ | - | - | Pure functions, no dependencies |
| Database queries | - | ✓ | - | Need real database for SQL semantics |
| API contracts | - | ✓ | - | Verify request/response structure |
| Critical user journey | - | - | ✓ | Full stack, business-critical path |
| Edge cases | ✓ | - | - | Too slow to test at higher levels |
| Error handling | ✓ | ✓ | - | Unit for logic, integration for failure modes |
Table 5: Testing Legacy Code Strategy
| Situation | Technique | Steps | Outcome |
|---|---|---|---|
| No tests, unclear behavior | Characterization Tests | Write tests documenting current behavior, verify tests fail when code breaks, refactor with test protection | Safe refactoring baseline |
| Untestable code | Find Seams | Identify injection points (constructor, parameters), extract dependencies, inject test doubles | Testable architecture |
| Complex output | Approval Testing | Capture output to approved file, human review and approve, future runs compare against approved | Test complex outputs easily |
| High-risk change | Cover and Modify | Add characterization tests, make change, verify tests still pass, improve tests | Safe modification of legacy code |
Integration with Other Skills
tdd-enforcement (Tactical Execution)
Relationship: Complementary
- tdd-enforcement: WORKFLOW for test-first development (red-green-refactor cycle)
- software-testing-strategy: STRATEGY for test design and planning
When to Use Each:
- Use
software-testing-strategywhen designing testing approach for a feature - Use
tdd-enforcementwhen actively writing code test-first
Example Flow:
- New feature: "Add user registration"
- Load
software-testing-strategyto design test strategy:- Unit tests: Password validation, email format validation
- Integration tests: Database user creation, unique email constraint
- E2E tests: Registration form submission journey
- Switch to
tdd-enforcementfor test-first implementation:- Red: Write failing test for password validation
- Green: Implement password validation
- Refactor: Extract validation logic
- Repeat for each component
systematic-code-review (Evaluation)
Relationship: Testing criteria in reviews
systematic-code-reviewevaluates test quality in PRs- References anti-patterns from
software-testing-strategy
Example: Code Review Integration
During code review (Step 5: Evaluate Tests), reviewer references this skill:
[test/order_test.py:45-60]
**issue (blocking, tests)**: This test exhibits the Flaky Test anti-pattern.
The test uses `time.sleep(1)` to wait for async operation, which creates non-deterministic behavior. Per software-testing-strategy Iron Law #2, tests must be deterministic.
**Suggestion:** Use a FakeClock or await the async operation properly with timeouts.
refactoring-to-patterns
Relationship: Test patterns support refactoring
- Test Data Builder is Builder pattern applied to tests
- Strategy pattern for test doubles (Mock, Stub, Fake, Spy)
Example: Refactoring with Tests
When refactoring introduces patterns, tests adapt:
# Before: Simple function
def calculate_price(quantity, item_type):
if item_type == "book":
return quantity * 10 * 0.9 # 10% discount
elif item_type == "electronics":
return quantity * 100 * 0.95 # 5% discount
# Test before
def test_book_price():
assert calculate_price(2, "book") == 18
# After: Strategy pattern
class BookPricing:
def calculate(self, quantity):
return quantity * 10 * 0.9
# Test after (tests behavior through interface)
def test_book_pricing_applies_discount():
pricing = BookPricing()
assert pricing.calculate(quantity=2) == 18
writing-code (Architecture: Decisions vs Effects)
Relationship: Architectural foundation for testability strategy
- Separating decisions from effects (FCIS pattern) maximizes unit test coverage (70% of pyramid)
- Pure decision logic enables fast, deterministic tests without mocks
- Effect layer requires fewer, lighter integration tests
How They Work Together:
- Design: Use writing-code principles to separate decisions from effects
- Strategy: Apply software-testing-strategy to plan test distribution (70/20/10)
- Execution: Use tdd-enforcement for test-first workflow
Example Flow:
Feature: User expiry notifications
- writing-code: Separate
users_needing_reminder()(decisions) from database/email (effects) - Strategy: 70% unit tests for decision logic, 20% integration for effects, 10% e2e for workflow
- TDD: Red-green-refactor cycle for decision functions
When to reference: When architecting features, reviewing code for testability, or struggling with heavy mocking requirements.
Key Insight (Google Testing Blog, October 2025):
"Mixing database calls, network requests, and other external interactions directly with your core logic can lead to code that's difficult to test."
Separating decisions from effects (FCIS pattern) solves this by enabling the testing pyramid economics: more logic in pure decisions = more fast, cheap unit tests.
Key Takeaways
- Follow testing pyramid: 70% unit, 20% integration, 10% e2e for optimal economics and feedback speed
- Test behavior, not implementation: Tests should survive refactoring of internal details
- Fast feedback: Unit tests <100ms, integration tests in seconds, full e2e suite under 30 minutes
- Risk-based coverage beats percentage targets: Focus on critical paths and high-risk areas
- Tests are production code: Apply same quality standards to test code
- Test at the appropriate level: Don't use e2e for business logic, don't mock everything in unit tests
- Mock external systems, not your own code: Test state, not interactions (Google standard)
- Tests enable refactoring: Good tests provide confidence to change production code
- Flaky tests destroy trust: Fix immediately or delete. Zero tolerance for non-deterministic tests.
- Legacy code is code without tests: Use characterization tests, find seams, cover and modify