| name | legacy-code-safety |
| description | Use when modifying, removing, or refactoring code that lacks test coverage. Emphasizes the danger of untested changes and the RGR workflow to add characterization tests before modifications. |
| allowed-tools | Read, Write, Edit, Bash, Grep, Glob |
Legacy Code Safety
"Changing untested code is like performing surgery blindfolded."
DANGER: The Risk of Modifying Untested Code
Modifying code without tests is one of the HIGHEST-RISK activities in software development.
When you change untested code:
- You CANNOT know what behaviors you're breaking
- You have NO safety net to catch mistakes
- You're relying on hope instead of evidence
- You're creating technical debt on top of technical debt
In dynamically-typed languages (Python, JavaScript, Ruby, PHP), this risk is EXPONENTIALLY higher:
- No compiler to catch type errors
- No static analysis to verify correctness
- Tests are your ONLY safety mechanism
- Runtime is when you discover your mistakes
The Golden Rule
NEVER modify untested code without adding tests first.
If code lacks tests:
- STOP
- Add characterization tests
- Verify tests pass
- THEN make your changes
The RGR Workflow for Legacy Code
This is NOT standard TDD. This is characterization testing - capturing current behavior before changing it.
Standard TDD (New Features)
- RED: Write failing test for NEW behavior
- GREEN: Implement to make test pass
- REFACTOR: Clean up implementation
Legacy Code RGR (Existing Code)
- RED: Write test that SHOULD pass (captures current behavior)
- GREEN: Verify it passes (confirms you captured reality correctly)
- REFACTOR: NOW you can safely modify with a safety net
The Critical Difference
TDD: You know what the behavior SHOULD be, write test for ideal
Legacy RGR: You DON'T know all behaviors, capture what EXISTS
Why "RED" for Passing Tests?
The "RED" step means "you're in the red zone" - working with dangerous untested code. Even though you expect the test to pass, you're at risk until you have that passing test as proof.
Characterization Testing Process
Step 1: Identify All Behaviors
Before writing tests, understand what the code currently does:
// Legacy function - no tests
function calculateShipping(order) {
let cost = 9.99
if (order.total > 100) {
cost = 0
}
if (order.weight > 50) {
cost += 25
}
if (order.destination === 'HI' || order.destination === 'AK') {
cost += 15
}
return cost
}
Questions to answer:
- What inputs does it accept?
- What outputs does it produce?
- What are the edge cases?
- Are there side effects?
- What happens with invalid input?
Step 2: Write Tests for Current Behavior
Document what EXISTS, not what should exist:
describe('calculateShipping (characterization)', () => {
// Capture base case
test('standard shipping', () => {
const order = { total: 50, weight: 10, destination: 'CA' }
expect(calculateShipping(order)).toBe(9.99)
})
// Capture free shipping threshold
test('free shipping over $100', () => {
const order = { total: 150, weight: 10, destination: 'CA' }
expect(calculateShipping(order)).toBe(0)
})
// Capture weight surcharge
test('heavy package surcharge', () => {
const order = { total: 50, weight: 75, destination: 'CA' }
expect(calculateShipping(order)).toBe(34.99) // 9.99 + 25
})
// Capture regional surcharge
test('Hawaii surcharge', () => {
const order = { total: 50, weight: 10, destination: 'HI' }
expect(calculateShipping(order)).toBe(24.99) // 9.99 + 15
})
// Discovered edge case: free shipping + weight
test('free shipping with heavy package', () => {
const order = { total: 150, weight: 75, destination: 'CA' }
expect(calculateShipping(order)).toBe(25) // 0 + 25 (is this right?)
})
// Discovered edge case: all conditions
test('all surcharges with free shipping threshold', () => {
const order = { total: 150, weight: 75, destination: 'HI' }
expect(calculateShipping(order)).toBe(40) // 0 + 25 + 15
})
})
Step 3: Verify Tests Pass
These tests MUST pass immediately:
npm test
If they don't pass:
- You misunderstood the current behavior
- Fix the test, not the code
- The test should document reality
Step 4: NOW You Can Refactor
With tests in place, you have a safety net:
// NOW we can safely refactor
function calculateShipping(order: Order): number {
const BASE_COST = 9.99
const FREE_SHIPPING_THRESHOLD = 100
const HEAVY_PACKAGE_WEIGHT = 50
const HEAVY_PACKAGE_SURCHARGE = 25
const REMOTE_DESTINATIONS = ['HI', 'AK']
const REMOTE_SURCHARGE = 15
let cost = order.total > FREE_SHIPPING_THRESHOLD ? 0 : BASE_COST
if (order.weight > HEAVY_PACKAGE_WEIGHT) {
cost += HEAVY_PACKAGE_SURCHARGE
}
if (REMOTE_DESTINATIONS.includes(order.destination)) {
cost += REMOTE_SURCHARGE
}
return cost
}
Run tests again:
npm test # All tests still pass
You've improved the code WITHOUT breaking existing behavior.
When This Skill Applies
Use this skill when:
- Deleting code that has no tests
- Modifying functions without test coverage
- Refactoring untested modules
- Changing behavior in legacy systems
- Test coverage is below 80% for code you're touching
- You inherit code from previous developers
- You're working in a codebase with poor test discipline
Even if the change seems "simple", if there are no tests, use this workflow.
Language-Specific Risk Levels
HIGH RISK: Dynamic/Weakly-Typed Languages
Python, JavaScript, Ruby, PHP:
- NO compile-time type checking
- NO static analysis of type correctness
- Tests are your ONLY safety net
- Runtime is when errors appear
- User discovers bugs, not compiler
Extra caution required:
- Write MORE tests
- Test edge cases thoroughly
- Check for None/null/undefined
- Verify types explicitly
- Add type hints/annotations when possible
MEDIUM RISK: Static/Strongly-Typed Languages
Go, Rust, TypeScript (strict mode), Java:
- Type system catches some errors
- Compiler validates correctness
- Still need tests for BEHAVIOR
- Types don't test business logic
Type safety helps, but:
- Types verify structure, not logic
- Tests still required for correctness
- Don't rely solely on type system
The Strangler Fig Pattern
For large legacy systems, don't try to test everything at once:
What Is It?
Named after the strangler fig vine that grows around a tree:
- Vine grows alongside tree
- Eventually replaces tree entirely
- Original tree can be removed
Applied to code:
- Wrap legacy code with new interface
- Add tests to the wrapper
- Gradually move logic to new code
- Eventually remove legacy code
Implementation
Phase 1: Wrap
// Legacy code (untested, scary)
function legacyCalculatePrice(item) {
// ... 500 lines of spaghetti ...
}
// New wrapper (tested)
function calculatePrice(item: Item): number {
// Add validation and tests here
validateItem(item)
// Call legacy for now
return legacyCalculatePrice(item)
}
Phase 2: Test the Boundary
describe('calculatePrice wrapper', () => {
test('validates input', () => {
expect(() => calculatePrice(null)).toThrow()
})
test('handles normal items', () => {
const item = { id: 1, price: 10 }
expect(calculatePrice(item)).toBe(10)
})
// Characterization tests for legacy behavior
test('matches legacy for discounts', () => {
const item = { id: 1, price: 10, discount: 0.2 }
expect(calculatePrice(item)).toBe(8)
})
})
Phase 3: Replace Internals
function calculatePrice(item: Item): number {
validateItem(item)
// New implementation (tested, clean)
const basePrice = item.price
const discount = item.discount ?? 0
return basePrice * (1 - discount)
// Legacy code removed!
}
Phase 4: Expand
Now that one function is safe, repeat for next function.
Benefits
- Incremental progress
- Always have working system
- Tests added gradually
- Risk minimized
- Can stop anytime
When to Use
- Large untested codebase
- Can't stop for months to add tests
- Need to deliver features while improving
- High-risk production system
Pre-Modification Checklist
Before touching ANY untested code:
- Identified all entry points (who calls this code?)
- Identified all exit points (what does this code call?)
- Understood current behavior (what does it do now?)
- Written characterization tests for current behavior
- All characterization tests pass
- Documented any surprising behaviors found
- Considered using Strangler Fig for large changes
ONLY AFTER ALL CHECKS PASS:
- Make your intended changes
- Run tests (verify behavior preserved or intentionally changed)
- Add tests for NEW behavior if applicable
- Document breaking changes if any
Red Flags
Stop immediately if:
- "I'll just make a quick change, no need for tests"
- "It's a small change, what could go wrong?"
- "I don't have time to write tests"
- "The code is too complex to test"
- "I'll add tests later" (you won't)
These are rationalizations for dangerous behavior.
Common Objections (and Responses)
"But tests will take longer than the change!"
Response: The initial change is fast. Finding and fixing the bugs you introduced takes much longer. Tests save time.
"The code is too complex to test!"
Response: If it's too complex to test, it's too complex to change safely. Simplify first, or use Strangler Fig pattern.
"We need to ship this feature NOW!"
Response: Shipping a broken feature is slower than shipping a tested feature late. Technical debt compounds.
"This code has worked fine without tests for years!"
Response: Until now. The moment you change it, that safety is gone. Past stability doesn't predict future stability.
"I understand the code, I won't break it!"
Response: Confidence is not a substitute for evidence. Even experts make mistakes.
Success Metrics
How to know you've done this right:
Green flags:
- All characterization tests pass before changes
- All tests still pass after changes
- New behavior has new tests
- Test coverage increased
- Code is more readable
- Confidence is high
Red flags:
- Tests fail after changes (and you don't know why)
- Skipped edge cases "to save time"
- Changed behavior without updating tests
- Test coverage decreased or stagnant
- More confused than before
Integration with Other Skills
Use these related skills:
- test-driven-development: For new behavior, use TDD after characterization tests exist
- refactoring: With tests in place, refactor safely
- boy-scout-rule: While adding tests, improve code quality
- proof-of-work: Document test results as evidence
- debugging: When characterization tests reveal unexpected behavior
Tools and Techniques
Code Coverage Tools
Use coverage tools to identify untested code:
# JavaScript/TypeScript
npm test -- --coverage
# Python
pytest --cov=mypackage
# Ruby
bundle exec rspec --require spec_helper
# Go
go test -cover ./...
Target coverage:
- Critical paths: 100%
- Business logic: 90%+
- Overall codebase: 80%+
Mutation Testing
Verify tests actually catch bugs:
# JavaScript
npm install -D stryker
# Python
pip install mutmut
Mutation testing changes your code slightly and verifies tests fail. If tests still pass after mutation, they're not effective.
Approval Testing
For complex outputs (HTML, JSON, reports):
test('generates invoice', () => {
const invoice = generateInvoice(order)
expect(invoice).toMatchSnapshot()
})
First run captures current output. Future runs verify it hasn't changed.
Real-World Example
Scenario
Legacy payment processor, 2000 lines, no tests, needs to add new payment method.
Step 1: Identify Scope
Don't test everything - test what you'll touch:
// Will modify this
function processPayment(payment, method) {
// ... many lines ...
}
// Will NOT modify these
function logTransaction(transaction) { }
function sendEmail(user, content) { }
Step 2: Add Characterization Tests
describe('processPayment (characterization)', () => {
test('processes credit card payment', () => {
const payment = { amount: 100, currency: 'USD' }
const method = { type: 'credit_card', number: '4111...' }
const result = processPayment(payment, method)
expect(result.success).toBe(true)
expect(result.transactionId).toBeDefined()
expect(result.amount).toBe(100)
})
// More tests for existing payment methods...
})
Step 3: Tests Pass
npm test
# ✅ All tests passing
Step 4: Add New Feature
function processPayment(payment, method) {
// Existing code preserved
if (method.type === 'credit_card') {
return processCreditCard(payment, method)
}
// NEW: Add cryptocurrency support
if (method.type === 'crypto') {
return processCrypto(payment, method)
}
throw new Error(`Unknown payment method: ${method.type}`)
}
Step 5: Add Tests for New Behavior
test('processes cryptocurrency payment', () => {
const payment = { amount: 100, currency: 'USD' }
const method = { type: 'crypto', address: '0x123...' }
const result = processPayment(payment, method)
expect(result.success).toBe(true)
expect(result.transactionId).toBeDefined()
})
Step 6: All Tests Pass
npm test
# ✅ All tests passing (old + new)
Result
- New feature added safely
- Existing behavior preserved
- Tests ensure no regression
- Future changes safer
Remember
- Untested code is dangerous code - especially in dynamic languages
- Characterize before changing - Capture current behavior first
- RGR workflow for legacy - Different from TDD, specific to untested code
- Strangler Fig for large systems - Gradual improvement beats big bang
- Tests are proof - Confidence without evidence is just hope
- No shortcuts - "Quick changes" create slow bugs
Legacy code is not an excuse for reckless changes. It's a reason for EXTRA caution.