| name | Localizing Variables |
| description | Declare variables in smallest possible scope, initialize close to first use, minimize span and live time |
| when_to_use | When writing any code with variables. When variables are declared at top of function but used later. When related statements are scattered. When variable scope is larger than necessary. When you see long variable live times or large span between references. When can't find where variable is initialized. When variable has stale or unexpected value. When forgot to reset counter or accumulator. When initialization errors occur. When variables declared far from first use. When window of vulnerability is large. When must scroll to see variable declaration and usage. When reviewing pull requests with wide variable scope. When refactoring functions with many local variables. |
| version | 1.0.0 |
| languages | all |
Localizing Variables
Overview
The Principle of Proximity: Keep related actions together. Declare variables in the smallest scope possible, initialize them close to where they're first used, and keep all references to a variable close together.
Core principle: Minimize the window of vulnerability. The smaller the scope and the closer the references, the less can go wrong and the easier code is to understand.
Goal: Reduce what you must keep in mind at any one time.
When to Use
Apply to every variable you declare:
- When declaring variables
- When initializing variables
- When reviewing code with scattered variable usage
- When refactoring to improve clarity
Warning signs:
- Variables declared at top of function, used at bottom
- All variables initialized together, far from first use
- Large gap between variable declaration and use
- Must scroll to see variable declaration and usage together
- Variables have function/class scope when could be more local
- Can't see all uses of variable on one screen
Key Concepts
Scope
How widely visible a variable is:
- Block scope - visible only within
{}or indented block (smallest) - Loop scope - visible only within loop
- Function scope - visible throughout function
- Class scope - visible to all methods in class
- Module scope - visible throughout file
- Global scope - visible everywhere (largest, avoid)
Rule: Start with smallest scope. Expand only if necessary.
Span
Distance between successive references to a variable:
a = 0 # First reference
b = 0 # 1 line between references to a
c = 0 # 2 lines between references to a
a = b + c # Second reference
# Span of a: 2 lines
Goal: Minimize span. Keep references close together.
Live Time
Total statements between first and last reference:
recordIndex = 0 # Line 2 - first reference
# ... 24 lines of other code ...
recordIndex += 1 # Line 28 - last reference
# Live time: 28 - 2 + 1 = 27 statements
Goal: Minimize live time. Reduce window of vulnerability.
The Principle of Proximity
Keep related actions together:
❌ Bad (declarations far from use):
def process_data():
# All declarations at top
index = 0
total = 0
done = False
result = []
# 20 lines later...
while index < count:
index += 1
# 30 lines later...
while not done:
if total > threshold:
done = True
# 40 lines later...
result.append(final_value)
return result
Live times: index=25, total=35, done=35, result=40. Average: 34 lines.
✅ Good (declare close to use):
def process_data():
# Declare index right before loop that uses it
index = 0
while index < count:
index += 1
# Declare total and done right before loop that uses them
total = 0
done = False
while not done:
if total > threshold:
done = True
# Declare result right before use
result = []
result.append(final_value)
return result
Live times: index=3, total=5, done=5, result=2. Average: 4 lines.
Improvement: 34 → 4 average live time (8.5x better)
Aggressive Scope Minimization
Technique 1: Declare at Point of First Use
Languages like C++, Java, Python, JavaScript allow this:
❌ Bad:
def calculate_report():
total = 0 # Declared at top
count = 0
average = 0.0
# 10 lines later...
total = sum(values)
count = len(values)
average = total / count if count > 0 else 0.0
✅ Good:
def calculate_report():
# 10 lines of other work...
# Declare right before use
total = sum(values)
count = len(values)
average = total / count if count > 0 else 0.0
Technique 2: Use Block Scope
In languages supporting block scope, use it:
# Process old data - variables scoped to this block
{
old_data = get_old_data()
old_total = sum(old_data)
print_summary(old_data, old_total)
} # old_data, old_total die here
# Process new data - fresh variables, no collision
{
new_data = get_new_data()
new_total = sum(new_data)
print_summary(new_data, new_total)
} # new_data, new_total die here
Technique 3: Initialize at Declaration
❌ Bad:
# Declare and initialize separately
user_count: int
user_count = 0
active_users: list
active_users = []
✅ Good:
# Initialize when declaring
user_count = 0
active_users = []
Technique 4: Use Loop-Scoped Variables
Many languages support declaring loop variables in the loop:
# Variable i exists ONLY within this loop
for i in range(count):
process(i)
# i doesn't exist here
# Another loop can use i without conflict
for i in range(other_count):
process_other(i)
Technique 5: Group Related Statements
Keep statements working with same variables together:
❌ Bad (scattered):
old_data = get_old_data()
new_data = get_new_data()
old_total = sum(old_data)
new_total = sum(new_data)
print_old_summary(old_data, old_total)
print_new_summary(new_data, new_total)
Must track 6 variables simultaneously
✅ Good (grouped):
# Group 1: Old data (track 2 variables)
old_data = get_old_data()
old_total = sum(old_data)
print_old_summary(old_data, old_total)
# Group 2: New data (track 2 variables)
new_data = get_new_data()
new_total = sum(new_data)
print_new_summary(new_data, new_total)
Track only 2 variables at a time
Minimize Global/Class Variables
Globals have enormous scope, span, and live time - avoid them.
❌ Bad:
total = 0 # Global
def add_to_total(value):
global total
total += value # Total lives forever, visible everywhere
def get_total():
global total
return total
✅ Good:
class Counter:
def __init__(self):
self._total = 0 # Private, encapsulated
def add(self, value):
self._total += value # Scoped to class
def get_total(self):
return self._total
Even better (eliminate state if possible):
def calculate_total(values):
return sum(values) # No state, no scope issues
Measuring Improvement
Calculate Span
Count lines between successive references:
value = 10 # Reference 1
line_a() # 1 line between
line_b() # 2 lines between
result = value * 2 # Reference 2
# Span: 2 lines
Target: Average span < 5 lines
Calculate Live Time
Count lines from first to last reference (inclusive):
count = 0 # Line 5 - first reference
# ... code ...
count += 1 # Line 42 - last reference
# Live time: 42 - 5 + 1 = 38 lines
Target: Average live time < 10 lines
Global variables: Infinite live time (another reason to avoid)
Common Mistakes
❌ All variables at top (C-style):
def process():
# Declare everything at top
i = 0
j = 0
total = 0
result = []
temp = None
# Use i 20 lines later
for i in range(10):
...
✅ Declare where used:
def process():
# Use variables close to declaration
for i in range(10): # i scoped to loop
...
total = 0 # Declared right before use
for item in items:
total += item
❌ Wide scope when narrow would work:
def calculate():
result = 0 # Function scope
if condition_a:
result = calculate_a() # Could be block-scoped
print(result)
# result accessible here but not used
if condition_b:
result = calculate_b() # Reusing same variable
print(result)
✅ Narrow scope:
def calculate():
if condition_a:
result = calculate_a() # Block scope
print(result)
# result doesn't exist here
if condition_b:
result = calculate_b() # Fresh variable, no collision
print(result)
❌ Long live time:
index = 0 # Line 2
# ... 50 lines of unrelated code ...
while index < count: # Line 52 - finally used
index += 1
# Live time: 51 lines
✅ Short live time:
# ... 50 lines of other code ...
index = 0 # Line 52 - right before use
while index < count:
index += 1
# Live time: 2 lines
Practical Guidelines
1. Initialize at Declaration
Languages like C++, Java, Python, JavaScript support this:
# ✅ Declare and initialize together
user_count = 0
active_users = get_active_users()
total_revenue = calculate_revenue(orders)
2. Loop Variables in Loop Declaration
# ✅ Loop variable scoped to loop
for user in users:
process(user) # user exists only here
for item in items:
handle(item) # item exists only here
3. Initialize Before Loop, Not at Function Start
❌ Bad:
def process_records():
index = 0 # Line 2
# 30 lines of other work...
# Finally use index
while index < record_count: # Line 32
index += 1
✅ Good:
def process_records():
# 30 lines of other work...
# Initialize right before loop
index = 0
while index < record_count:
index += 1
Why: When you modify code and add outer loop, initialization is correctly placed for re-initialization on each pass.
4. Extract Related Statements Into Routines
Long routines create large scope. Break into smaller routines:
# ✅ Each routine has small scope
def process_old_data():
old_data = get_old_data() # Scoped to this routine only
old_total = sum(old_data)
return old_total
def process_new_data():
new_data = get_new_data() # Fresh variable, no collision
new_total = sum(new_data)
return new_total
Variables automatically die when routine exits.
5. Prefer Most Restricted Visibility
Hierarchy (most restricted to least):
- Block/loop local (if language supports)
- Function local
- Private instance variable
- Protected instance variable
- Public instance variable
- Module-level
- Global
Start at #1, move down only if necessary.
Convenience vs Intellectual Manageability
Two philosophies:
Convenience Philosophy
"Make variables global so they're convenient to access anywhere. Don't fool around with parameter lists."
Problem: Easy to write, hard to read/maintain. Any routine can modify any variable. Must understand entire program to modify one part.
Intellectual Manageability Philosophy
"Keep variables as local as possible. Hide information. Minimize what you must think about at once."
Benefit: Harder to write (must think about scope), easier to read/maintain. Can understand one routine without knowing all others.
Code Complete's recommendation: Favor intellectual manageability. Code is read 10x more than written.
Example Transformation
❌ Before (wide scope, long live time):
def summarize_data():
# All variables at top with function scope
old_data = None
num_old = 0
total_old = 0
new_data = None
num_new = 0
total_new = 0
old_data = get_old_data() # Line 8
num_old = len(old_data)
total_old = sum(old_data)
print_summary(old_data, total_old, num_old) # Line 11
save_summary(total_old, num_old)
new_data = get_new_data() # Line 14
num_new = len(new_data)
total_new = sum(new_data)
print_summary(new_data, total_new, num_new) # Line 17
save_summary(total_new, num_new)
Must track 6 variables throughout entire function. Live times: old_data=4, new_data=4, etc.
✅ After (narrow scope, short live time):
def summarize_data():
# Group 1: Old data (variables live only 4 lines)
old_data = get_old_data()
num_old = len(old_data)
total_old = sum(old_data)
print_summary(old_data, total_old, num_old)
save_summary(total_old, num_old)
# old_data, num_old, total_old mentally "die" here
# Group 2: New data (fresh variables, 4 lines)
new_data = get_new_data()
num_new = len(new_data)
total_new = sum(new_data)
print_summary(new_data, total_new, num_new)
save_summary(total_new, num_new)
Track 3 variables at a time. Same live times, but mental load reduced.
✅ Even better (extract to routines):
def summarize_data():
process_old_data() # Variables scoped inside
process_new_data() # Variables scoped inside
def process_old_data():
# Variables live only in this routine (5 lines)
old_data = get_old_data()
num_old = len(old_data)
total_old = sum(old_data)
print_summary(old_data, total_old, num_old)
save_summary(total_old, num_old)
# Variables die when routine exits
Track 3 variables maximum. Variables automatically cleaned up.
Quick Reference
| Situation | Technique | Example |
|---|---|---|
| Loop variable | Declare in loop | for i in range(n): |
| Temporary calculation | Inline or immediate use | total = sum(values) right before print(total) |
| Used in one block | Declare in that block | if-block variable stays in if-block |
| Used across function | Function-local only if necessary | Don't make it class/global |
| Shared across methods | Private instance variable | Not public unless necessary |
| Truly global | Access routine instead | Wrap in getter/setter |
Common Patterns
Pattern 1: Loop Counters
# ✅ Counter scoped to loop
for i in range(len(items)):
process(items[i])
# i doesn't exist here - good
# Can reuse i in another loop without collision
for i in range(len(others)):
process(others[i])
Pattern 2: Calculation Results
# ✅ Calculate right before use
def generate_report():
# Other work...
# Calculate only when needed, use immediately
total_revenue = sum(order.total for order in orders)
print(f"Total Revenue: ${total_revenue}")
# Different calculation later
active_user_count = len([u for u in users if u.is_active])
print(f"Active Users: {active_user_count}")
Pattern 3: Temporary Values
# ✅ Temporary lives only 2-3 lines
def swap_values(arr, i, j):
temp = arr[i] # Temporary variable
arr[i] = arr[j]
arr[j] = temp # temp used and done (live time: 3 lines)
Pattern 4: Iteration State
# ✅ State variables near the loop they control
def process_until_done():
# Other work...
# Declare state right before loop
done = False
attempts = 0
while not done and attempts < max_attempts:
done = try_process()
attempts += 1
Measuring Your Code
Calculate average span and live time for a function:
def example():
a = 0 # Line 2, ref 1
b = 0 # Line 3
c = 0 # Line 4
a = b + c # Line 5, ref 2 of a
d = a * 2 # Line 6, ref 3 of a
# Span of a: (5-2=3) + (6-5=1) = 4, average 2
# Live time of a: 6-2+1 = 5 lines
Good metrics:
- Average span: < 5 lines
- Average live time: < 10 lines
If higher: Consider localizing more aggressively.
Benefits
1. Reduces Window of Vulnerability
Shorter live time = fewer lines where variable could be incorrectly modified:
# Live time = 50 lines
value = 0 # Line 1
# ... 48 lines where value could be accidentally changed ...
return value # Line 50
vs.
# Live time = 2 lines
value = calculate() # Line 49
return value # Line 50 - less can go wrong
2. Easier to Understand
Seeing declaration and usage together aids comprehension:
# ✅ See both on one screen
count = len(items)
print(f"Processing {count} items")
# ❌ Must scroll to see declaration
# ... Line 1: count = 0
# ... 50 lines later...
# ... Line 51: print(f"Processing {count} items") # What's count?
3. Reduces Initialization Errors
Variables initialized close to use are less likely to have stale values:
# ✅ Initialized fresh each loop iteration
for batch in batches:
count = 0 # Reset for each batch
for item in batch:
count += 1
vs.
# ❌ Might forget to reset
count = 0 # Top of function
for batch in batches:
# Forgot to reset count - accumulates across batches!
for item in batch:
count += 1
4. Easier Refactoring
Short live time makes extracting to separate routine easier:
# Related statements with short-lived variables
# are easy to extract into their own routine
old_data = get_old_data() # Lines 10-13
old_total = sum(old_data)
print_summary(old_data, old_total)
# → Extract to process_old_data() routine
Quick Checklist
For each variable, ask:
- Is this declared in smallest possible scope?
- Could it be loop-scoped instead of function-scoped?
- Could it be function-local instead of class-level?
- Is it initialized close to first use?
- Are all references to it close together?
- Could I extract this section into a routine to reduce scope further?
- If class/global variable: could it be passed as parameter instead?
If any answer is "yes, could be smaller" → localize it.
Real-World Impact
From Code Complete:
- Research shows shorter live times correlate with fewer errors
- Proximity aids comprehension
- Local scope prevents unintended side effects
- Baseline test: agent declared all variables at top (live time ~15-25 lines)
- Better practice: average live time < 10 lines
Key insight: The more you can hide, the less you must keep in mind. The less in mind, the fewer errors.
Integration with Other Skills
For initialization: See patterns in skills/designing-before-coding for thinking about data initialization early in design
For naming: See skills/naming-variables - short-lived local variables can have shorter names; longer-lived variables need more descriptive names