| name | performance-optimization-for-sims |
| description | Profiling, LOD systems, time-slicing, spatial partitioning for sim scaling |
Performance Optimization for Simulations
When to use this skill: When simulations run below target frame rate (typically 60 FPS for PC, 30 FPS for mobile), especially with large agent counts (100+ units), complex AI, physics calculations, or proximity queries. Critical for RTS games, crowd simulations, ecosystem models, traffic systems, and any scenario requiring 1000+ active entities.
What this skill provides: Systematic methodology for performance optimization using profiling-driven decisions, spatial partitioning patterns, level-of-detail (LOD) systems, time-slicing, caching strategies, data-oriented design, and selective multithreading. Focuses on achieving 60 FPS at scale while maintaining gameplay quality.
Core Concepts
The Optimization Hierarchy (Critical Order)
ALWAYS optimize in this order - each level provides 10-100× improvement:
PROFILE FIRST (0.5-1 hour investment)
- Identify actual bottleneck with profiler
- Measure baseline performance
- Set target frame time budgets
- Never guess - 80% of time is usually in 20% of code
Algorithmic Optimizations (10-100× improvement)
- Fix O(n²) → O(n) or O(n log n)
- Spatial partitioning for proximity queries
- Replace brute-force with smart algorithms
- Biggest wins, do these FIRST
Level of Detail (LOD) (2-10× improvement)
- Reduce computation for distant/unimportant entities
- Smooth transitions (no popping)
- Priority-based update frequencies
- Behavior LOD + visual LOD
Time-Slicing (2-5× improvement)
- Spread work across multiple frames
- Frame time budgets per system
- Priority queues for important work
- Amortized expensive operations
Caching (2-10× improvement)
- Avoid redundant calculations
- LRU eviction + TTL
- Proper invalidation
- Bounded memory usage
Data-Oriented Design (1.5-3× improvement)
- Cache-friendly memory layouts
- Struct of Arrays (SoA) vs Array of Structs (AoS)
- Minimize pointer chasing
- Batch operations on contiguous data
Multithreading (1.5-4× improvement)
- ONLY if still needed after above
- Job systems for data parallelism
- Avoid locks and race conditions
- Complexity cost is high
Example: RTS with 1000 units at 10 FPS → 60 FPS
- Profile: Vision checks are 80% of frame time
- Spatial partitioning: O(n²) → O(n) = 50× faster → 40 FPS
- LOD: Distant units update less = 1.5× faster → 60 FPS
- Done in 30 minutes vs 2 hours of trial-and-error
Profiling Methodology
Three-step profiling process:
Capture Baseline (before optimization)
- Total frame time
- Time per major system (AI, physics, rendering, pathfinding)
- CPU vs GPU bound
- Memory allocations per frame
- Cache misses (if profiler supports)
Identify Bottleneck (80/20 rule)
- Sort functions by time spent
- Focus on top 3-5 functions (usually 80% of time)
- Understand WHY they're slow (algorithm, data layout, cache misses)
Validate Improvement (after each optimization)
- Measure same metrics
- Calculate speedup ratio
- Check for regressions (new bottlenecks)
- Iterate until target met
Profiling Tools:
- Python: cProfile, line_profiler, memory_profiler, py-spy
- C++: VTune, perf, Instruments (Mac), Very Sleepy
- Unity: Unity Profiler, Deep Profile mode
- Unreal: Unreal Insights, stat commands
- Browser: Chrome DevTools Performance tab
Example Profiling Output:
Total frame time: 100ms (10 FPS)
Function Time % of Frame
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
update_vision_checks() 80ms 80% ← BOTTLENECK
update_ai() 10ms 10%
update_pathfinding() 5ms 5%
update_physics() 3ms 3%
render() 2ms 2%
Diagnosis: O(n²) vision checks (1000 units × 1000 = 1M checks/frame)
Solution: Spatial partitioning → O(n) checks
Spatial Partitioning
Problem: Proximity queries are O(n²) when checking every entity against every other
- 100 entities = 10,000 checks
- 1,000 entities = 1,000,000 checks (death)
- 10,000 entities = 100,000,000 checks (impossible)
Solution: Divide space into regions, only check entities in nearby regions
Spatial Hash Grid (simplest, fastest for uniform distribution)
- Divide world into fixed-size cells (e.g., 50×50 units)
- Hash entity position to cell(s)
- Query: Check only entities in neighboring cells
- Complexity: O(n) to build, O(1) average query
- Best for: Mostly uniform entity distribution
Quadtree (adaptive, good for clustered entities)
- Recursively subdivide space into 4 quadrants
- Split when cell exceeds threshold (e.g., 10 entities)
- Query: Descend tree, check overlapping nodes
- Complexity: O(n log n) to build, O(log n) average query
- Best for: Entities clustered in areas
Octree (3D version of quadtree)
- Recursively subdivide 3D space into 8 octants
- Same benefits as quadtree for 3D worlds
- Best for: 3D flight sims, space games, underwater
Decision Framework:
Spatial Partitioning Choice:
├─ 2D WORLD with UNIFORM DISTRIBUTION?
│ └─ Use Spatial Hash Grid (simplest, fastest)
│
├─ 2D WORLD with CLUSTERED ENTITIES?
│ └─ Use Quadtree (adapts to density)
│
├─ 3D WORLD?
│ └─ Use Octree (3D quadtree)
│
└─ VERY LARGE WORLD (multiple km²)?
└─ Use Hierarchical Grid (multiple grids at different scales)
Performance Impact:
- 1000 units: O(n²) = 1,000,000 checks → O(n) = 1,000 checks = 1000× faster
- Typical speedup: 50-100× in practice (accounting for grid overhead)
Level of Detail (LOD)
Concept: Reduce computation for entities that don't need full precision
Distance-Based LOD Levels:
LOD 0 (0-50 units from camera): Full detail
- Full AI decision-making (10 Hz)
- Precise pathfinding
- Detailed animations
- All visual effects
LOD 1 (50-100 units): Reduced detail
- Simplified AI (5 Hz)
- Coarse pathfinding (waypoints only)
- Simplified animations
- Reduced effects
LOD 2 (100-200 units): Minimal detail
- Basic AI (1 Hz)
- Straight-line movement
- Static pose or simple animation
- No effects
LOD 3 (200+ units): Culled or dormant
- State update only (0.2 Hz)
- No pathfinding
- Billboards or invisible
- No physics
Importance-Based LOD (better than distance alone):
def calculate_lod_level(entity, camera, player):
# Multiple factors determine importance
distance = entity.distance_to(camera)
is_player_unit = entity.team == player.team
is_in_combat = entity.in_combat
is_selected = entity in player.selection
# Important entities always get high LOD
if is_selected:
return 0 # Always full detail
if is_player_unit and is_in_combat:
return 0 # Player's units in combat = critical
# Distance-based for others
if distance < 50:
return 0
elif distance < 100:
return 1
elif distance < 200:
return 2
else:
return 3
Smooth LOD Transitions (avoid popping):
Hysteresis: Different thresholds for upgrading vs downgrading
- Upgrade LOD at 90 units
- Downgrade LOD at 110 units
- 20-unit buffer prevents thrashing
Time delay: Wait N seconds before downgrading LOD
- Prevents rapid flicker at boundary
Blend animations: Cross-fade between LOD levels
- 0.5-1 second blend
Behavior LOD Examples:
| System | LOD 0 (Full) | LOD 1 (Reduced) | LOD 2 (Minimal) | LOD 3 (Dormant) |
|---|---|---|---|---|
| AI | Behavior tree 10 Hz | Simple FSM 5 Hz | Follow path 1 Hz | State only 0.2 Hz |
| Pathfinding | Full A* | Hierarchical | Straight line | None |
| Vision | 360° scan 10 Hz | Forward cone 5 Hz | None | None |
| Physics | Full collision | Bounding box | None | None |
| Animation | Full skeleton | 5 bones | Static pose | None |
| Audio | 3D positioned | 2D ambient | None | None |
Performance Impact:
- 1000 units: 100% at LOD 0 vs 20% at LOD 0 + 80% at LOD 1-3 = 3-5× faster
Time-Slicing
Concept: Spread expensive operations across multiple frames to stay within frame budget
Frame Time Budget (60 FPS = 16.67ms per frame):
Frame Budget (16.67ms total):
├─ Rendering: 6ms (40%)
├─ AI: 4ms (24%)
├─ Physics: 3ms (18%)
├─ Pathfinding: 2ms (12%)
└─ Other: 1.67ms (10%)
Time-Slicing Pattern 1: Fixed Budget Per Frame
class TimeSlicedSystem:
def __init__(self, budget_ms=2.0):
self.budget = budget_ms
self.pending_work = []
def add_work(self, work_item, priority=0):
# Priority queue: higher priority = processed first
heapq.heappush(self.pending_work, (-priority, work_item))
def update(self, dt):
start_time = time.time()
processed = 0
while self.pending_work and (time.time() - start_time) < self.budget:
priority, work_item = heapq.heappop(self.pending_work)
work_item.execute()
processed += 1
return processed
# Usage: Pathfinding
pathfinding_system = TimeSlicedSystem(budget_ms=2.0)
for unit in units_needing_paths:
priority = calculate_priority(unit) # Player units = high priority
pathfinding_system.add_work(PathfindRequest(unit), priority)
# Each frame: process as many as fit in 2ms budget
paths_found = pathfinding_system.update(dt)
Time-Slicing Pattern 2: Amortized Updates
class AmortizedUpdateManager:
def __init__(self, entities, updates_per_frame=200):
self.entities = entities
self.updates_per_frame = updates_per_frame
self.current_index = 0
def update(self, dt):
# Update N entities per frame
for i in range(self.updates_per_frame):
entity = self.entities[self.current_index]
entity.expensive_update(dt)
self.current_index = (self.current_index + 1) % len(self.entities)
# All entities updated every N frames
# 1000 entities / 200 per frame = every 5 frames = 12 Hz at 60 FPS
# Priority-based amortization
def update_with_priority(entities, frame_count):
for i, entity in enumerate(entities):
# Distance-based update frequency
distance = entity.distance_to_camera()
if distance < 50:
entity.update() # Every frame (60 Hz)
elif distance < 100 and frame_count % 2 == 0:
entity.update() # Every 2 frames (30 Hz)
elif distance < 200 and frame_count % 5 == 0:
entity.update() # Every 5 frames (12 Hz)
elif frame_count % 30 == 0:
entity.update() # Every 30 frames (2 Hz)
Time-Slicing Pattern 3: Incremental Processing
class IncrementalPathfinder:
"""Find path over multiple frames instead of blocking"""
def __init__(self, max_nodes_per_frame=100):
self.max_nodes = max_nodes_per_frame
self.open_set = []
self.closed_set = set()
self.current_request = None
def start_pathfind(self, start, goal):
self.current_request = PathRequest(start, goal)
heapq.heappush(self.open_set, (0, start))
return self.current_request
def step(self):
"""Process up to max_nodes this frame, return True if done"""
if not self.current_request:
return True
nodes_processed = 0
while self.open_set and nodes_processed < self.max_nodes:
current = heapq.heappop(self.open_set)
if current == self.current_request.goal:
self.current_request.path = reconstruct_path(current)
self.current_request.complete = True
return True
# Expand neighbors...
nodes_processed += 1
return False # Not done yet, continue next frame
# Usage
pathfinder = IncrementalPathfinder(max_nodes_per_frame=100)
request = pathfinder.start_pathfind(unit.pos, target.pos)
# Each frame
while not request.complete:
pathfinder.step() # Process 100 nodes, spread over multiple frames
Performance Impact:
- 1000 expensive updates: 1000/frame → 200/frame = 5× faster
- Pathfinding: Blocking 50ms → 2ms budget = stays at 60 FPS
Caching Strategies
When to Cache:
- Expensive calculations used repeatedly (pathfinding, line-of-sight)
- Results that change infrequently (static paths, terrain visibility)
- Deterministic results (same input = same output)
Cache Design Pattern:
class PerformanceCache:
def __init__(self, max_size=10000, ttl_seconds=60.0):
self.cache = {} # key -> CacheEntry
self.max_size = max_size
self.ttl = ttl_seconds
self.access_times = {} # LRU tracking
self.insert_times = {} # TTL tracking
def get(self, key):
current_time = time.time()
if key not in self.cache:
return None
# Check TTL (time-to-live)
if current_time - self.insert_times[key] > self.ttl:
del self.cache[key]
del self.access_times[key]
del self.insert_times[key]
return None
# Update LRU
self.access_times[key] = current_time
return self.cache[key]
def put(self, key, value):
current_time = time.time()
# Evict if full (LRU eviction)
if len(self.cache) >= self.max_size:
# Find least recently used
lru_key = min(self.access_times, key=self.access_times.get)
del self.cache[lru_key]
del self.access_times[lru_key]
del self.insert_times[lru_key]
self.cache[key] = value
self.access_times[key] = current_time
self.insert_times[key] = current_time
def invalidate(self, key):
"""Explicit invalidation when data changes"""
if key in self.cache:
del self.cache[key]
del self.access_times[key]
del self.insert_times[key]
def invalidate_region(self, x, y, radius):
"""Invalidate all cache entries in region (e.g., terrain changed)"""
keys_to_remove = []
for key in self.cache:
if self._key_in_region(key, x, y, radius):
keys_to_remove.append(key)
for key in keys_to_remove:
self.invalidate(key)
# Usage: Path caching
path_cache = PerformanceCache(max_size=5000, ttl_seconds=30.0)
def get_or_calculate_path(start, goal):
# Quantize to grid for cache key (allow slight position variance)
key = (round(start.x), round(start.y), round(goal.x), round(goal.y))
cached = path_cache.get(key)
if cached:
return cached # Cache hit!
# Cache miss - calculate
path = expensive_pathfinding(start, goal)
path_cache.put(key, path)
return path
# Invalidate when terrain changes
def on_building_placed(x, y):
path_cache.invalidate_region(x, y, radius=100)
Cache Invalidation Strategies:
Time-To-Live (TTL): Expire after N seconds
- Good for: Dynamic environments (traffic, weather)
- Example: Path cache with 30 second TTL
Event-Based: Invalidate on specific events
- Good for: Known change triggers (building placed, obstacle moved)
- Example: Invalidate paths when wall built
Hybrid: TTL + event-based
- Good for: Most scenarios
- Example: 60 second TTL OR invalidate on terrain change
Performance Impact:
- Pathfinding with 60% cache hit rate: 40% of requests calculate = 2.5× faster
- Line-of-sight with 80% cache hit rate: 20% of requests calculate = 5× faster
Data-Oriented Design (DOD)
Concept: Organize data for cache-friendly access patterns
Array of Structs (AoS) - Traditional OOP approach:
class Unit:
def __init__(self):
self.x = 0.0
self.y = 0.0
self.health = 100
self.damage = 10
# ... 20 more fields ...
units = [Unit() for _ in range(1000)]
# Update positions (cache-unfriendly)
for unit in units:
unit.x += unit.velocity_x * dt # Load entire Unit struct for each unit
unit.y += unit.velocity_y * dt # Only using 2 fields, wasting cache
Struct of Arrays (SoA) - DOD approach:
class UnitSystem:
def __init__(self, count):
# Separate arrays for each component
self.positions_x = [0.0] * count
self.positions_y = [0.0] * count
self.velocities_x = [0.0] * count
self.velocities_y = [0.0] * count
self.health = [100] * count
self.damage = [10] * count
# ... more arrays ...
units = UnitSystem(1000)
# Update positions (cache-friendly)
for i in range(len(units.positions_x)):
units.positions_x[i] += units.velocities_x[i] * dt # Sequential memory access
units.positions_y[i] += units.velocities_y[i] * dt # Perfect for CPU cache
Why SoA is Faster:
- CPU cache lines are 64 bytes
- AoS: Load 1-2 units per cache line (if Unit is 32-64 bytes)
- SoA: Load 8-16 floats per cache line (4 bytes each)
- 4-8× better cache utilization = 1.5-3× faster in practice
When to Use SoA:
- Batch operations on many entities (position updates, damage calculations)
- Systems that only need 1-2 fields from entity
- Performance-critical inner loops
When AoS is Okay:
- Small entity counts (< 100)
- Operations needing many fields
- Prototyping (DOD is optimization, not default)
ECS Architecture (combines SoA + component composition):
# Components (pure data)
class Position:
x: float
y: float
class Velocity:
x: float
y: float
class Health:
current: int
max: int
# Systems (pure logic)
class MovementSystem:
def update(self, positions, velocities, dt):
# Batch process all entities with Position + Velocity
for i in range(len(positions)):
positions[i].x += velocities[i].x * dt
positions[i].y += velocities[i].y * dt
class CombatSystem:
def update(self, positions, health, attacks):
# Only process entities with Position + Health + Attack
# ...
# Entity is just an ID
entities = [Entity(id=i) for i in range(1000)]
Performance Impact:
- Cache-friendly data layout: 1.5-3× faster for batch operations
- ECS architecture: Enables efficient multithreading (no shared mutable state)
Multithreading (Use Sparingly)
When to Multithread:
- ✅ After all other optimizations (if still needed)
- ✅ Embarrassingly parallel work (no dependencies)
- ✅ Long-running tasks (benefit outweighs overhead)
- ✅ Native code (C++, Rust) - avoids GIL
When NOT to Multithread:
- ❌ Python CPU-bound code (GIL limits to 1 core)
- ❌ Before trying simpler optimizations
- ❌ Lots of shared mutable state (locking overhead)
- ❌ Small tasks (thread overhead > savings)
Job System Pattern (best practice):
from concurrent.futures import ThreadPoolExecutor
import threading
class JobSystem:
def __init__(self, num_workers=4):
self.executor = ThreadPoolExecutor(max_workers=num_workers)
def submit_batch(self, jobs):
"""Submit list of independent jobs, return futures"""
futures = [self.executor.submit(job.execute) for job in jobs]
return futures
def wait_all(self, futures):
"""Wait for all jobs to complete"""
results = [future.result() for future in futures]
return results
# Good: Parallel pathfinding (independent tasks)
job_system = JobSystem(num_workers=4)
path_jobs = [PathfindJob(unit.pos, unit.target) for unit in units_needing_paths]
futures = job_system.submit_batch(path_jobs)
# Do other work while pathfinding runs...
# Collect results
paths = job_system.wait_all(futures)
Data Parallelism Pattern (no shared mutable state):
def update_positions_parallel(positions, velocities, dt, num_workers=4):
"""Update positions in parallel batches"""
def update_batch(start_idx, end_idx):
# Each worker gets exclusive slice (no locks needed)
for i in range(start_idx, end_idx):
positions[i].x += velocities[i].x * dt
positions[i].y += velocities[i].y * dt
# Split work into batches
batch_size = len(positions) // num_workers
futures = []
for worker_id in range(num_workers):
start = worker_id * batch_size
end = start + batch_size if worker_id < num_workers - 1 else len(positions)
future = executor.submit(update_batch, start, end)
futures.append(future)
# Wait for all workers
for future in futures:
future.result()
Common Multithreading Pitfalls:
Race Conditions (shared mutable state)
# BAD: Multiple threads modifying same list for unit in units: threading.Thread(target=unit.update, args=(all_units,)).start() # Each thread reads/writes all_units = data race! # GOOD: Read-only shared data for unit in units: # units is read-only for all threads # Each unit only modifies itself (exclusive ownership) threading.Thread(target=unit.update, args=(units,)).start()False Sharing (cache line contention)
# BAD: Adjacent array elements on same cache line shared_counters = [0] * 8 # 8 threads updating 8 counters # Thread 0 updates counter[0], Thread 1 updates counter[1] # Both on same 64-byte cache line = cache thrashing! # GOOD: Pad to separate cache lines class PaddedCounter: value: int padding: [int] * 15 # Force to own cache line shared_counters = [PaddedCounter() for _ in range(8)]Excessive Locking (defeats parallelism)
# BAD: Single lock for everything lock = threading.Lock() def update_unit(unit): with lock: # Only 1 thread can work at a time! unit.update() # GOOD: Lock-free or fine-grained locking def update_unit(unit): unit.update() # Each unit independent, no lock needed
Performance Impact:
- 4 cores: Ideal speedup = 4×, realistic = 2-3× (overhead, Amdahl's law)
- Python: Minimal (GIL), use multiprocessing or native extensions
- C++/Rust: Good (2-3× on 4 cores for parallelizable work)
Decision Frameworks
Framework 1: Systematic Optimization Process
Use this process EVERY time performance is inadequate:
Step 1: PROFILE (mandatory, do first)
├─ Capture baseline metrics
├─ Identify top 3-5 bottlenecks (80% of time)
└─ Understand WHY slow (algorithm, data, cache)
Step 2: ALGORITHMIC (10-100× gains)
├─ Is bottleneck O(n²) or worse?
│ ├─ Proximity queries? → Spatial partitioning
│ ├─ Pathfinding? → Hierarchical, flow fields, or caching
│ └─ Sorting? → Better algorithm or less frequent
├─ Is bottleneck doing redundant work?
│ └─ Add caching with LRU + TTL
└─ Measure improvement, re-profile
Step 3: LOD (2-10× gains)
├─ Can distant entities use less detail?
│ ├─ Distance-based LOD levels (4 levels)
│ ├─ Importance weighting (player units > NPC)
│ └─ Smooth transitions (hysteresis, blending)
└─ Measure improvement, re-profile
Step 4: TIME-SLICING (2-5× gains)
├─ Can work spread across multiple frames?
│ ├─ Set frame budget per system (2-4ms typical)
│ ├─ Priority queue (important work first)
│ └─ Amortized updates (N entities per frame)
└─ Measure improvement, re-profile
Step 5: DATA-ORIENTED DESIGN (1.5-3× gains)
├─ Is bottleneck cache-unfriendly?
│ ├─ Convert AoS → SoA for batch operations
│ ├─ Group hot data together
│ └─ Minimize pointer chasing
└─ Measure improvement, re-profile
Step 6: MULTITHREADING (1.5-4× gains, high complexity)
├─ Still below target after above?
│ ├─ Identify embarrassingly parallel work
│ ├─ Job system for independent tasks
│ ├─ Data parallelism (no shared mutable state)
│ └─ Avoid locks (lock-free or per-entity ownership)
└─ Measure improvement, re-profile
Step 7: VALIDATE
├─ Met target frame rate? → Done!
├─ Still slow? → Return to Step 1, find new bottleneck
└─ Regression? → Revert and try different approach
Example Application (1000-unit RTS at 10 FPS):
- Profile: Vision checks are 80% (80ms/100ms frame)
- Algorithmic: Add spatial hash grid → 40 FPS (15ms vision checks)
- LOD: Distant units update at 5 Hz → 55 FPS (11ms vision)
- Time-slicing: 2ms pathfinding budget → 60 FPS ✅ Done
- (Skip DOD and multithreading - already at target)
Framework 2: Choosing Spatial Partitioning
START: What's my proximity query scenario?
├─ 2D WORLD with UNIFORM ENTITY DISTRIBUTION?
│ └─ Use SPATIAL HASH GRID
│ - Cell size = 2× query radius (e.g., vision range 50 → cells 100×100)
│ - O(n) build, O(1) query
│ - Simplest to implement
│ - Example: RTS units on open battlefield
│
├─ 2D WORLD with CLUSTERED ENTITIES?
│ └─ Use QUADTREE
│ - Split threshold = 10-20 entities per node
│ - Max depth = 8-10 levels
│ - O(n log n) build, O(log n) query
│ - Example: City simulation (dense downtown, sparse suburbs)
│
├─ 3D WORLD?
│ └─ Use OCTREE
│ - Same as quadtree, but 8 children per node
│ - Example: Space game, underwater sim
│
├─ VERY LARGE WORLD (> 10 km²)?
│ └─ Use HIERARCHICAL GRID
│ - Coarse grid (1km cells) + fine grid (50m cells) per coarse cell
│ - Example: MMO world, open-world game
│
└─ ENTITIES MOSTLY STATIONARY?
└─ Use STATIC QUADTREE/OCTREE
- Build once, query many times
- Example: Building placement, static obstacles
Implementation Complexity:
- Spatial Hash Grid: 1-2 hours (simple)
- Quadtree: 3-5 hours (moderate)
- Octree: 4-6 hours (moderate)
- Hierarchical Grid: 6-10 hours (complex)
Performance Characteristics:
| Method | Build Time | Query Time | Memory | Best For |
|---|---|---|---|---|
| Hash Grid | O(n) | O(1) avg | Low | Uniform distribution |
| Quadtree | O(n log n) | O(log n) avg | Medium | Clustered entities |
| Octree | O(n log n) | O(log n) avg | Medium | 3D worlds |
| Hierarchical | O(n) | O(1) avg | Higher | Massive worlds |
Framework 3: LOD Level Assignment
For each entity, assign LOD level based on:
├─ IMPORTANCE (highest priority)
│ ├─ Player-controlled? → LOD 0 (always full detail)
│ ├─ Player's team AND in combat? → LOD 0
│ ├─ Selected units? → LOD 0
│ ├─ Quest-critical NPCs? → LOD 0
│ └─ Otherwise, use distance-based...
│
├─ DISTANCE FROM CAMERA (secondary)
│ ├─ 0-50 units → LOD 0 (full detail)
│ │ - Update: 60 Hz (every frame)
│ │ - AI: Full behavior tree
│ │ - Pathfinding: Precise A*
│ │ - Animation: Full skeleton
│ │
│ ├─ 50-100 units → LOD 1 (reduced)
│ │ - Update: 30 Hz (every 2 frames)
│ │ - AI: Simplified FSM
│ │ - Pathfinding: Hierarchical
│ │ - Animation: 10 bones
│ │
│ ├─ 100-200 units → LOD 2 (minimal)
│ │ - Update: 12 Hz (every 5 frames)
│ │ - AI: Basic scripted
│ │ - Pathfinding: Waypoints
│ │ - Animation: Static pose
│ │
│ └─ 200+ units → LOD 3 (culled)
│ - Update: 2 Hz (every 30 frames)
│ - AI: State only (no decisions)
│ - Pathfinding: None
│ - Animation: None (invisible or billboard)
│
└─ SCREEN SIZE (tertiary)
├─ Occluded or < 5 pixels? → LOD 3 (culled)
└─ Small on screen? → Bump LOD down 1 level
Hysteresis to Prevent LOD Thrashing:
# Without hysteresis (bad - flickers)
lod = 0 if distance < 100 else 1
# Entity at 99-101 units: LOD flip-flops every frame!
# With hysteresis (good - stable)
if distance < 90:
lod = 0 # Upgrade at 90
elif distance > 110:
lod = 1 # Downgrade at 110
# else: keep current LOD
# 20-unit buffer prevents thrashing
Framework 4: When to Use Multithreading
Should I multithread this system?
├─ ALREADY optimized algorithmic/LOD/caching?
│ └─ NO → Do those FIRST (10-100× gains vs 2-4× for threading)
│
├─ WORK IS EMBARRASSINGLY PARALLEL?
│ ├─ Independent tasks (pathfinding requests)? → YES, good candidate
│ ├─ Lots of shared mutable state? → NO, locking kills performance
│ └─ Need results immediately? → NO, adds latency
│
├─ TASK DURATION > 1ms?
│ ├─ YES → Threading overhead is small % of work
│ └─ NO → Overhead dominates, not worth it
│
├─ PYTHON or NATIVE CODE?
│ ├─ Python → Use multiprocessing (avoid GIL) or native extensions
│ └─ C++/Rust → ThreadPool or job system works well
│
├─ COMPLEXITY COST JUSTIFIED?
│ ├─ Can maintain code with debugging difficulty? → Consider it
│ └─ Team inexperienced with threading? → Avoid (bugs are costly)
│
└─ EXPECTED SPEEDUP > 1.5×?
├─ 4 cores: Realistic = 2-3× (not 4× due to overhead)
├─ Worth complexity? → Your call
└─ Not worth it? → Try other optimizations first
Threading Decision Tree Example:
Scenario: Pathfinding for 100 units
├─ Already using caching? YES (60% hit rate)
├─ Work is parallel? YES (each path independent)
├─ Task duration? 5ms per path (good for threading)
├─ Language? Python (GIL problem)
│ └─ Solution: Use multiprocessing or native pathfinding library
├─ Complexity justified? 100 paths × 5ms = 500ms → 60ms with 8 workers
│ └─ YES, worth it (8× speedup)
│
Decision: Use multiprocessing.Pool with 8 workers
Framework 5: Frame Time Budget Allocation
60 FPS = 16.67ms per frame, 30 FPS = 33.33ms per frame
Budget Template (adjust based on game type):
60 FPS Frame Budget (16.67ms total):
├─ Rendering: 6.0ms (40%)
│ ├─ Culling: 1.0ms
│ ├─ Draw calls: 4.0ms
│ └─ Post-processing: 1.0ms
│
├─ AI: 3.5ms (24%)
│ ├─ Behavior trees: 2.0ms
│ ├─ Sensors/perception: 1.0ms
│ └─ Decision-making: 0.5ms
│
├─ Physics: 3.0ms (18%)
│ ├─ Broad-phase: 0.5ms
│ ├─ Narrow-phase: 1.5ms
│ └─ Constraint solving: 1.0ms
│
├─ Pathfinding: 2.0ms (12%)
│ ├─ New paths: 1.5ms
│ └─ Path following: 0.5ms
│
├─ Gameplay: 1.0ms (6%)
│ ├─ Economy updates: 0.3ms
│ ├─ Event processing: 0.4ms
│ └─ UI updates: 0.3ms
│
└─ Buffer: 1.17ms (7%)
└─ Unexpected spikes, GC, etc.
Budget by Game Type:
| Game Type | Rendering | AI | Physics | Pathfinding | Gameplay |
|---|---|---|---|---|---|
| RTS | 30% | 30% | 10% | 20% | 10% |
| FPS | 50% | 15% | 20% | 5% | 10% |
| City Builder | 35% | 20% | 5% | 15% | 25% |
| Physics Sim | 30% | 5% | 50% | 5% | 10% |
| Turn-Based | 60% | 15% | 5% | 10% | 10% |
Enforcement Pattern:
class FrameBudgetMonitor:
def __init__(self):
self.budgets = {
'rendering': 6.0,
'ai': 3.5,
'physics': 3.0,
'pathfinding': 2.0,
'gameplay': 1.0
}
self.measurements = {key: [] for key in self.budgets}
def measure(self, system_name, func):
start = time.perf_counter()
result = func()
elapsed_ms = (time.perf_counter() - start) * 1000
self.measurements[system_name].append(elapsed_ms)
# Alert if over budget
if elapsed_ms > self.budgets[system_name]:
print(f"⚠️ {system_name} over budget: {elapsed_ms:.2f}ms / {self.budgets[system_name]:.2f}ms")
return result
def report(self):
print("Frame Time Budget Report:")
for system, budget in self.budgets.items():
avg = sum(self.measurements[system]) / len(self.measurements[system])
pct = (avg / budget) * 100
print(f" {system}: {avg:.2f}ms / {budget:.2f}ms ({pct:.0f}%)")
# Usage
monitor = FrameBudgetMonitor()
def game_loop():
monitor.measure('ai', lambda: update_ai(units))
monitor.measure('physics', lambda: update_physics(world))
monitor.measure('pathfinding', lambda: update_pathfinding(units))
monitor.measure('rendering', lambda: render_scene(camera))
if frame_count % 300 == 0: # Every 5 seconds
monitor.report()
Implementation Patterns
Pattern 1: Spatial Hash Grid for Proximity Queries
Problem: Checking every unit against every other unit for vision/attack is O(n²)
- 1000 units = 1,000,000 checks per frame = death
Solution: Spatial hash grid divides world into cells, only check nearby cells
import math
from collections import defaultdict
class SpatialHashGrid:
"""
Spatial partitioning using hash grid for O(1) average query time.
Best for: Uniform entity distribution, 2D worlds
Cell size rule: 2× maximum query radius
"""
def __init__(self, cell_size=100):
self.cell_size = cell_size
self.grid = defaultdict(list) # (cell_x, cell_y) -> [entities]
def _hash(self, x, y):
"""Convert world position to cell coordinates"""
cell_x = int(math.floor(x / self.cell_size))
cell_y = int(math.floor(y / self.cell_size))
return (cell_x, cell_y)
def clear(self):
"""Clear all entities (call at start of frame)"""
self.grid.clear()
def insert(self, entity):
"""Insert entity into grid"""
cell = self._hash(entity.x, entity.y)
self.grid[cell].append(entity)
def query_radius(self, x, y, radius):
"""
Find all entities within radius of (x, y).
Returns: List of entities in range
Complexity: O(k) where k = entities in nearby cells (typically 10-50)
"""
# Calculate which cells to check
min_cell_x = int(math.floor((x - radius) / self.cell_size))
max_cell_x = int(math.floor((x + radius) / self.cell_size))
min_cell_y = int(math.floor((y - radius) / self.cell_size))
max_cell_y = int(math.floor((y + radius) / self.cell_size))
candidates = []
# Check all cells in range
for cell_x in range(min_cell_x, max_cell_x + 1):
for cell_y in range(min_cell_y, max_cell_y + 1):
cell = (cell_x, cell_y)
candidates.extend(self.grid.get(cell, []))
# Filter by exact distance (candidates may be outside radius)
results = []
radius_sq = radius * radius
for entity in candidates:
dx = entity.x - x
dy = entity.y - y
dist_sq = dx * dx + dy * dy
if dist_sq <= radius_sq:
results.append(entity)
return results
def query_rect(self, min_x, min_y, max_x, max_y):
"""Find all entities in rectangular region"""
min_cell_x = int(math.floor(min_x / self.cell_size))
max_cell_x = int(math.floor(max_x / self.cell_size))
min_cell_y = int(math.floor(min_y / self.cell_size))
max_cell_y = int(math.floor(max_y / self.cell_size))
results = []
for cell_x in range(min_cell_x, max_cell_x + 1):
for cell_y in range(min_cell_y, max_cell_y + 1):
cell = (cell_x, cell_y)
results.extend(self.grid.get(cell, []))
return results
# Usage Example
class Unit:
def __init__(self, x, y, team):
self.x = x
self.y = y
self.team = team
self.vision_range = 50
self.attack_range = 20
def game_loop():
units = [Unit(random() * 1000, random() * 1000, random_team())
for _ in range(1000)]
# Cell size = 2× max query radius (vision range)
spatial_grid = SpatialHashGrid(cell_size=100)
while running:
# Rebuild grid each frame (units move)
spatial_grid.clear()
for unit in units:
spatial_grid.insert(unit)
# Update units
for unit in units:
# OLD (O(n²)): Check all 1000 units = 1,000,000 checks
# enemies = [u for u in units if u.team != unit.team and distance(u, unit) < vision_range]
# NEW (O(k)): Check ~10-50 units in nearby cells
nearby = spatial_grid.query_radius(unit.x, unit.y, unit.vision_range)
enemies = [u for u in nearby if u.team != unit.team]
# Attack enemies in range
for enemy in enemies:
dist_sq = (unit.x - enemy.x)**2 + (unit.y - enemy.y)**2
if dist_sq <= unit.attack_range**2:
enemy.health -= unit.damage
# Performance: O(n²) → O(n)
# 1000 units: 1,000,000 checks → ~30,000 checks (nearby cells only)
# Speedup: ~30-50× for vision/attack queries
Pattern 2: Quadtree for Clustered Entities
When to use: Entities cluster in specific areas (cities, battlefields) with sparse regions
class Quadtree:
"""
Adaptive spatial partitioning for clustered entity distributions.
Best for: Non-uniform distribution, entities cluster in areas
Automatically subdivides dense regions
"""
class Node:
def __init__(self, x, y, width, height, max_entities=10, max_depth=8):
self.x = x
self.y = y
self.width = width
self.height = height
self.max_entities = max_entities
self.max_depth = max_depth
self.entities = []
self.children = None # [NW, NE, SW, SE] when subdivided
def is_leaf(self):
return self.children is None
def contains(self, entity):
"""Check if entity is within this node's bounds"""
return (self.x <= entity.x < self.x + self.width and
self.y <= entity.y < self.y + self.height)
def subdivide(self):
"""Split into 4 quadrants"""
hw = self.width / 2 # half width
hh = self.height / 2 # half height
# Create 4 children: NW, NE, SW, SE
self.children = [
Quadtree.Node(self.x, self.y, hw, hh,
self.max_entities, self.max_depth - 1), # NW
Quadtree.Node(self.x + hw, self.y, hw, hh,
self.max_entities, self.max_depth - 1), # NE
Quadtree.Node(self.x, self.y + hh, hw, hh,
self.max_entities, self.max_depth - 1), # SW
Quadtree.Node(self.x + hw, self.y + hh, hw, hh,
self.max_entities, self.max_depth - 1), # SE
]
# Move entities to children
for entity in self.entities:
for child in self.children:
if child.contains(entity):
child.insert(entity)
break
self.entities.clear()
def insert(self, entity):
"""Insert entity into quadtree"""
if not self.contains(entity):
return False
if self.is_leaf():
self.entities.append(entity)
# Subdivide if over capacity and can go deeper
if len(self.entities) > self.max_entities and self.max_depth > 0:
self.subdivide()
else:
# Insert into appropriate child
for child in self.children:
if child.insert(entity):
break
return True
def query_radius(self, x, y, radius, results):
"""Find entities within radius of (x, y)"""
# Check if search circle intersects this node
closest_x = max(self.x, min(x, self.x + self.width))
closest_y = max(self.y, min(y, self.y + self.height))
dx = x - closest_x
dy = y - closest_y
dist_sq = dx * dx + dy * dy
if dist_sq > radius * radius:
return # No intersection
if self.is_leaf():
# Check entities in this leaf
radius_sq = radius * radius
for entity in self.entities:
dx = entity.x - x
dy = entity.y - y
if dx * dx + dy * dy <= radius_sq:
results.append(entity)
else:
# Recurse into children
for child in self.children:
child.query_radius(x, y, radius, results)
def __init__(self, world_width, world_height, max_entities=10, max_depth=8):
self.root = Quadtree.Node(0, 0, world_width, world_height,
max_entities, max_depth)
def insert(self, entity):
self.root.insert(entity)
def query_radius(self, x, y, radius):
results = []
self.root.query_radius(x, y, radius, results)
return results
# Usage
quadtree = Quadtree(world_width=1000, world_height=1000,
max_entities=10, max_depth=8)
# Insert entities
for unit in units:
quadtree.insert(unit)
# Query
enemies_nearby = quadtree.query_radius(player.x, player.y, vision_range=50)
# Performance: O(log n) average query
# Adapts to entity distribution automatically
Pattern 3: Distance-Based LOD System
Problem: All entities update at full frequency, wasting CPU on distant entities
Solution: Update frequency based on distance from camera/player
class LODSystem:
"""
Level-of-detail system with smooth transitions and importance weighting.
LOD 0: Full detail (near camera, important entities)
LOD 1: Reduced detail (medium distance)
LOD 2: Minimal detail (far distance)
LOD 3: Dormant (very far, culled)
"""
# LOD configuration
LOD_LEVELS = [
{
'name': 'LOD_0_FULL',
'distance_min': 0,
'distance_max': 50,
'update_hz': 60, # Every frame
'ai_enabled': True,
'pathfinding': 'full', # Precise A*
'animation': 'full', # Full skeleton
'physics': 'full' # Full collision
},
{
'name': 'LOD_1_REDUCED',
'distance_min': 50,
'distance_max': 100,
'update_hz': 30, # Every 2 frames
'ai_enabled': True,
'pathfinding': 'hierarchical',
'animation': 'reduced', # 10 bones
'physics': 'bbox' # Bounding box only
},
{
'name': 'LOD_2_MINIMAL',
'distance_min': 100,
'distance_max': 200,
'update_hz': 12, # Every 5 frames
'ai_enabled': False, # Scripted only
'pathfinding': 'waypoints',
'animation': 'static', # Static pose
'physics': 'none'
},
{
'name': 'LOD_3_CULLED',
'distance_min': 200,
'distance_max': float('inf'),
'update_hz': 2, # Every 30 frames
'ai_enabled': False,
'pathfinding': 'none',
'animation': 'none',
'physics': 'none'
}
]
def __init__(self, camera, player):
self.camera = camera
self.player = player
self.frame_count = 0
# Hysteresis to prevent LOD thrashing
self.hysteresis = 20 # Units of distance buffer
def calculate_lod(self, entity):
"""
Calculate LOD level for entity based on importance and distance.
Priority:
1. Importance (player-controlled, in combat, selected)
2. Distance from camera
3. Screen size
"""
# Important entities always get highest LOD
if self._is_important(entity):
return 0
# Distance-based LOD
distance = self._distance_to_camera(entity)
# Current LOD (for hysteresis)
current_lod = getattr(entity, 'lod_level', 0)
# Determine LOD level with hysteresis
for i, lod in enumerate(self.LOD_LEVELS):
if i < current_lod:
# Upgrading (closer): Use min distance
if distance <= lod['distance_max'] - self.hysteresis:
return i
else:
# Downgrading (farther): Use max distance
if distance <= lod['distance_max'] + self.hysteresis:
return i
return len(self.LOD_LEVELS) - 1
def _is_important(self, entity):
"""Check if entity is important (always highest LOD)"""
return (entity.player_controlled or
entity.selected or
(entity.team == self.player.team and entity.in_combat))
def _distance_to_camera(self, entity):
dx = entity.x - self.camera.x
dy = entity.y - self.camera.y
return math.sqrt(dx * dx + dy * dy)
def should_update(self, entity):
"""Check if entity should update this frame"""
lod_level = entity.lod_level
lod_config = self.LOD_LEVELS[lod_level]
update_hz = lod_config['update_hz']
if update_hz >= 60:
return True # Every frame
# Calculate frame interval
frame_interval = 60 // update_hz # 60 FPS baseline
# Offset by entity ID to spread updates across frames
return (self.frame_count + entity.id) % frame_interval == 0
def update(self, entities):
"""Update LOD levels and entities"""
self.frame_count += 1
# Update LOD levels (cheap, do every frame)
for entity in entities:
entity.lod_level = self.calculate_lod(entity)
# Update entities based on LOD (expensive, time-sliced)
for entity in entities:
if self.should_update(entity):
lod_config = self.LOD_LEVELS[entity.lod_level]
self._update_entity(entity, lod_config)
def _update_entity(self, entity, lod_config):
"""Update entity according to LOD configuration"""
if lod_config['ai_enabled']:
entity.update_ai()
if lod_config['pathfinding'] == 'full':
entity.update_pathfinding_full()
elif lod_config['pathfinding'] == 'hierarchical':
entity.update_pathfinding_hierarchical()
elif lod_config['pathfinding'] == 'waypoints':
entity.update_pathfinding_waypoints()
if lod_config['animation'] != 'none':
entity.update_animation(lod_config['animation'])
if lod_config['physics'] == 'full':
entity.update_physics_full()
elif lod_config['physics'] == 'bbox':
entity.update_physics_bbox()
# Usage
lod_system = LODSystem(camera, player)
def game_loop():
lod_system.update(units)
# Only entities that should_update() this frame were updated
# Performance: 1000 units all at LOD 0 → mixed LOD levels
# Typical distribution: 100 LOD0 + 300 LOD1 + 400 LOD2 + 200 LOD3
# Effective updates: 100 + 150 + 80 + 7 = 337 updates/frame
# Speedup: 1000 → 337 = 3× faster
Pattern 4: Time-Sliced Pathfinding with Priority Queue
Problem: 100 path requests × 5ms each = 500ms frame time (2 FPS)
Solution: Process paths over multiple frames with priority (player units first)
import heapq
import time
from enum import Enum
class PathPriority(Enum):
"""Priority levels for pathfinding requests"""
CRITICAL = 0 # Player-controlled, combat
HIGH = 1 # Player's units
NORMAL = 2 # Visible units
LOW = 3 # Off-screen units
class PathRequest:
def __init__(self, entity, start, goal, priority):
self.entity = entity
self.start = start
self.goal = goal
self.priority = priority
self.path = None
self.complete = False
self.timestamp = time.time()
class TimeSlicedPathfinder:
"""
Pathfinding system with frame time budget and priority queue.
Features:
- 2ms frame budget (stays at 60 FPS)
- Priority queue (important requests first)
- Incremental pathfinding (spread work over frames)
- Request timeout (abandon old requests)
"""
def __init__(self, budget_ms=2.0, timeout_seconds=5.0):
self.budget = budget_ms / 1000.0 # Convert to seconds
self.timeout = timeout_seconds
self.pending = [] # Priority queue: (priority, request)
self.active_request = None
self.pathfinder = AStarPathfinder() # Your pathfinding implementation
# Statistics
self.stats = {
'requests_submitted': 0,
'requests_completed': 0,
'requests_timeout': 0,
'avg_time_to_completion': 0
}
def submit_request(self, entity, start, goal, priority=PathPriority.NORMAL):
"""Submit pathfinding request with priority"""
request = PathRequest(entity, start, goal, priority)
heapq.heappush(self.pending, (priority.value, request))
self.stats['requests_submitted'] += 1
return request
def update(self, dt):
"""
Process pathfinding requests within frame budget.
Returns: Number of paths completed this frame
"""
start_time = time.perf_counter()
completed = 0
while time.perf_counter() - start_time < self.budget:
# Get next request
if not self.active_request:
if not self.pending:
break # No more work
priority, request = heapq.heappop(self.pending)
# Check timeout
if time.time() - request.timestamp > self.timeout:
self.stats['requests_timeout'] += 1
continue
self.active_request = request
self.pathfinder.start(request.start, request.goal)
# Process active request incrementally
# (process up to 100 nodes this frame)
done = self.pathfinder.step(max_nodes=100)
if done:
# Request complete
self.active_request.path = self.pathfinder.get_path()
self.active_request.complete = True
self.active_request.entity.path = self.active_request.path
time_to_complete = time.time() - self.active_request.timestamp
self._update_avg_time(time_to_complete)
self.stats['requests_completed'] += 1
self.active_request = None
completed += 1
return completed
def _update_avg_time(self, time_to_complete):
"""Update moving average of completion time"""
alpha = 0.1 # Smoothing factor
current_avg = self.stats['avg_time_to_completion']
self.stats['avg_time_to_completion'] = (
alpha * time_to_complete + (1 - alpha) * current_avg
)
def get_stats(self):
"""Get performance statistics"""
pending_count = len(self.pending) + (1 if self.active_request else 0)
return {
**self.stats,
'pending_requests': pending_count,
'completion_rate': (
self.stats['requests_completed'] / max(1, self.stats['requests_submitted'])
)
}
# Usage
pathfinder = TimeSlicedPathfinder(budget_ms=2.0)
def game_loop():
# Submit pathfinding requests
for unit in units_needing_paths:
# Determine priority
if unit.player_controlled:
priority = PathPriority.CRITICAL
elif unit.team == player.team:
priority = PathPriority.HIGH
elif unit.visible:
priority = PathPriority.NORMAL
else:
priority = PathPriority.LOW
pathfinder.submit_request(unit, unit.pos, unit.target, priority)
# Process paths (stays within 2ms budget)
paths_completed = pathfinder.update(dt)
# Every 5 seconds, print stats
if frame_count % 300 == 0:
stats = pathfinder.get_stats()
print(f"Pathfinding: {stats['requests_completed']} complete, "
f"{stats['pending_requests']} pending, "
f"avg time: {stats['avg_time_to_completion']:.2f}s")
# Performance:
# Without time-slicing: 100 paths × 5ms = 500ms frame (2 FPS)
# With time-slicing: 2ms budget per frame = 60 FPS maintained
# Paths complete over multiple frames, but high-priority paths finish first
Pattern 5: LRU Cache with TTL for Pathfinding
Problem: Recalculating same paths repeatedly wastes CPU
Solution: Cache paths with LRU eviction and time-to-live
import time
from collections import OrderedDict
class PathCache:
"""
LRU cache with TTL for pathfinding results.
Features:
- LRU eviction (least recently used)
- TTL expiration (paths become stale)
- Region invalidation (terrain changes)
- Bounded memory (max size)
"""
def __init__(self, max_size=5000, ttl_seconds=30.0):
self.cache = OrderedDict() # Maintains insertion order for LRU
self.max_size = max_size
self.ttl = ttl_seconds
self.insert_times = {}
# Statistics
self.stats = {
'hits': 0,
'misses': 0,
'evictions': 0,
'expirations': 0,
'invalidations': 0
}
def _make_key(self, start, goal):
"""Create cache key from start/goal positions"""
# Quantize to grid (allows position variance within cell)
# Cell size = 5 units (units within 5 units share same path)
return (
round(start[0] / 5) * 5,
round(start[1] / 5) * 5,
round(goal[0] / 5) * 5,
round(goal[1] / 5) * 5
)
def get(self, start, goal):
"""
Get cached path if available and not expired.
Returns: Path if cached and valid, None otherwise
"""
key = self._make_key(start, goal)
current_time = time.time()
if key not in self.cache:
self.stats['misses'] += 1
return None
# Check TTL
if current_time - self.insert_times[key] > self.ttl:
# Expired
del self.cache[key]
del self.insert_times[key]
self.stats['expirations'] += 1
self.stats['misses'] += 1
return None
# Cache hit - move to end (most recently used)
self.cache.move_to_end(key)
self.stats['hits'] += 1
return self.cache[key]
def put(self, start, goal, path):
"""Store path in cache"""
key = self._make_key(start, goal)
current_time = time.time()
# Evict if at capacity (LRU)
if len(self.cache) >= self.max_size and key not in self.cache:
# Remove oldest (first item in OrderedDict)
oldest_key = next(iter(self.cache))
del self.cache[oldest_key]
del self.insert_times[oldest_key]
self.stats['evictions'] += 1
# Store path
self.cache[key] = path
self.insert_times[key] = current_time
# Move to end (most recently used)
self.cache.move_to_end(key)
def invalidate_region(self, x, y, radius):
"""
Invalidate all cached paths in region.
Call when terrain changes (building placed, wall destroyed, etc.)
"""
radius_sq = radius * radius
keys_to_remove = []
for key in self.cache:
start_x, start_y, goal_x, goal_y = key
# Check if start or goal in affected region
dx_start = start_x - x
dy_start = start_y - y
dx_goal = goal_x - x
dy_goal = goal_y - y
if (dx_start * dx_start + dy_start * dy_start <= radius_sq or
dx_goal * dx_goal + dy_goal * dy_goal <= radius_sq):
keys_to_remove.append(key)
for key in keys_to_remove:
del self.cache[key]
del self.insert_times[key]
self.stats['invalidations'] += 1
def get_hit_rate(self):
"""Calculate cache hit rate"""
total = self.stats['hits'] + self.stats['misses']
if total == 0:
return 0.0
return self.stats['hits'] / total
def get_stats(self):
"""Get cache statistics"""
return {
**self.stats,
'size': len(self.cache),
'hit_rate': self.get_hit_rate()
}
# Usage
path_cache = PathCache(max_size=5000, ttl_seconds=30.0)
def find_path(start, goal):
# Try cache first
cached_path = path_cache.get(start, goal)
if cached_path:
return cached_path # Cache hit!
# Cache miss - calculate path
path = expensive_pathfinding(start, goal)
path_cache.put(start, goal, path)
return path
# Invalidate when terrain changes
def on_building_placed(building):
# Invalidate paths near building
path_cache.invalidate_region(building.x, building.y, radius=100)
# Print stats periodically
def print_cache_stats():
stats = path_cache.get_stats()
print(f"Path Cache: {stats['size']}/{path_cache.max_size} entries, "
f"hit rate: {stats['hit_rate']:.1%}, "
f"{stats['hits']} hits, {stats['misses']} misses")
# Performance:
# 60% hit rate: Only 40% of requests calculate = 2.5× faster
# 80% hit rate: Only 20% of requests calculate = 5× faster
Pattern 6: Job System for Parallel Work
When to use: Native code (C++/Rust) with embarrassingly parallel work
#include <vector>
#include <thread>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <functional>
/**
* Job system for data-parallel work.
*
* Features:
* - Worker thread pool
* - Lock-free job submission (mostly)
* - Wait-for-completion
* - No shared mutable state (data parallelism)
*/
class JobSystem {
public:
using Job = std::function<void()>;
JobSystem(int num_workers = std::thread::hardware_concurrency()) {
workers.reserve(num_workers);
for (int i = 0; i < num_workers; ++i) {
workers.emplace_back([this]() { this->worker_loop(); });
}
}
~JobSystem() {
{
std::unique_lock<std::mutex> lock(queue_mutex);
shutdown = true;
}
queue_cv.notify_all();
for (auto& worker : workers) {
worker.join();
}
}
// Submit single job
void submit(Job job) {
{
std::unique_lock<std::mutex> lock(queue_mutex);
job_queue.push(std::move(job));
}
queue_cv.notify_one();
}
// Submit batch of jobs and wait for all to complete
void submit_batch_and_wait(const std::vector<Job>& jobs) {
std::atomic<int> remaining{static_cast<int>(jobs.size())};
std::mutex wait_mutex;
std::condition_variable wait_cv;
for (const auto& job : jobs) {
submit([&, job]() {
job();
if (--remaining == 0) {
wait_cv.notify_one();
}
});
}
// Wait for all jobs to complete
std::unique_lock<std::mutex> lock(wait_mutex);
wait_cv.wait(lock, [&]() { return remaining == 0; });
}
private:
void worker_loop() {
while (true) {
Job job;
{
std::unique_lock<std::mutex> lock(queue_mutex);
queue_cv.wait(lock, [this]() {
return !job_queue.empty() || shutdown;
});
if (shutdown && job_queue.empty()) {
return;
}
job = std::move(job_queue.front());
job_queue.pop();
}
job();
}
}
std::vector<std::thread> workers;
std::queue<Job> job_queue;
std::mutex queue_mutex;
std::condition_variable queue_cv;
bool shutdown = false;
};
// Usage Example: Parallel position updates
struct Unit {
float x, y;
float vx, vy;
void update(float dt) {
x += vx * dt;
y += vy * dt;
}
};
void update_units_parallel(std::vector<Unit>& units, float dt, JobSystem& job_system) {
const int num_workers = 8;
const int batch_size = units.size() / num_workers;
std::vector<JobSystem::Job> jobs;
for (int worker_id = 0; worker_id < num_workers; ++worker_id) {
int start = worker_id * batch_size;
int end = (worker_id == num_workers - 1) ? units.size() : start + batch_size;
jobs.push_back([&units, dt, start, end]() {
// Each worker updates exclusive slice (no locks needed)
for (int i = start; i < end; ++i) {
units[i].update(dt);
}
});
}
job_system.submit_batch_and_wait(jobs);
}
// Performance: 4 cores = 2-3× speedup (accounting for overhead)
Common Pitfalls
Pitfall 1: Premature Optimization (Most Common!)
Symptoms:
- Jumping to complex solutions (multithreading) before measuring bottleneck
- Micro-optimizing (sqrt → squared distance) without profiling
- Optimizing code that's 1% of frame time
Why it fails:
- You optimize the wrong thing (80% of time elsewhere)
- Complex solutions add bugs without benefit
- Time wasted that could go to real bottleneck
Example:
# BAD: Premature micro-optimization
# Replaced sqrt with squared distance (saves 0.1ms)
# But vision checks are only 1% of frame time!
dist_sq = dx*dx + dy*dy
if dist_sq < range_sq: # Micro-optimization
# ...
# GOOD: Profile first, found pathfinding is 80% of frame time
# Added path caching (saves 40ms!)
cached_path = path_cache.get(start, goal)
if cached_path:
return cached_path
Solution:
- ✅ Profile FIRST - measure where time is actually spent
- ✅ Focus on top bottleneck (80/20 rule)
- ✅ Measure improvement - validate optimization helped
- ✅ Repeat - find next bottleneck
Quote: "Premature optimization is the root of all evil" - Donald Knuth
Pitfall 2: LOD Popping (Visual Artifacts)
Symptoms:
- Units suddenly appear/disappear at LOD boundaries
- Animation quality jumps (smooth → jerky)
- Players notice "fake" LOD transitions
Why it fails:
- No hysteresis: Entity at 99-101 units flip-flops between LOD 0/1 every frame
- Instant transitions: LOD 0 → LOD 3 in one frame (jarring)
- Distance-only: Ignores importance (player's units should always be high detail)
Example:
# BAD: No hysteresis (causes popping)
if distance < 100:
lod = 0
else:
lod = 1
# Entity at 99.5 units: LOD 0
# Entity moves to 100.5 units: LOD 1
# Entity moves to 99.5 units: LOD 0 (flicker!)
# GOOD: Hysteresis + importance + blend
if is_important(entity):
lod = 0 # Always full detail for player units
elif distance < 90:
lod = 0 # Upgrade at 90
elif distance > 110:
lod = 1 # Downgrade at 110
# else: keep current LOD
# 20-unit buffer prevents thrashing
# Blend between LOD levels over 0.5 seconds
blend_factor = (time.time() - lod_transition_start) / 0.5
Solution:
- ✅ Hysteresis - different thresholds for upgrade (90) vs downgrade (110)
- ✅ Importance weighting - player units, selected units always high LOD
- ✅ Blend transitions - cross-fade over 0.5-1 second
- ✅ Time delay - wait N seconds before downgrading LOD
Pitfall 3: Thread Contention and Race Conditions
Symptoms:
- Crashes with "list modified during iteration"
- Nondeterministic behavior (works sometimes)
- Slower with multithreading than without (due to locking)
Why it fails:
- Multiple threads read/write shared mutable state (data race)
- Excessive locking serializes code (defeats parallelism)
- False sharing - adjacent data on same cache line thrashes
Example:
# BAD: Race condition (shared mutable list)
def update_unit_threaded(unit, all_units):
# Thread 1 reads all_units
# Thread 2 modifies all_units (adds/removes unit)
# Thread 1 crashes: "list changed during iteration"
for other in all_units:
if collides(unit, other):
all_units.remove(other) # RACE!
# BAD: Excessive locking (serialized)
lock = threading.Lock()
def update_unit(unit):
with lock: # Only 1 thread works at a time!
unit.update()
# GOOD: Data parallelism (no shared mutable state)
def update_units_parallel(units, num_workers=4):
batch_size = len(units) // num_workers
def update_batch(start, end):
# Exclusive ownership - no locks needed
for i in range(start, end):
units[i].update() # Only modifies units[i]
with ThreadPoolExecutor(max_workers=num_workers) as executor:
futures = []
for worker_id in range(num_workers):
start = worker_id * batch_size
end = start + batch_size if worker_id < num_workers - 1 else len(units)
futures.append(executor.submit(update_batch, start, end))
# Wait for all
for future in futures:
future.result()
Solution:
- ✅ Avoid shared mutable state - each thread owns exclusive data
- ✅ Read-only sharing - threads can read shared data if no writes
- ✅ Message passing - communicate via queues instead of shared memory
- ✅ Lock-free algorithms - atomic operations, compare-and-swap
- ✅ Test with thread sanitizer - detects data races
Pitfall 4: Cache Invalidation Bugs
Symptoms:
- Units walk through walls (stale paths cached)
- Memory leak (cache grows unbounded)
- Crashes after long play sessions (out of memory)
Why it fails:
- No invalidation: Cache never updates when terrain changes
- No TTL: Old paths stay forever, become invalid
- No eviction: Cache grows until memory exhausted
Example:
# BAD: No invalidation, no TTL, unbounded growth
cache = {}
def get_path(start, goal):
key = (start, goal)
if key in cache:
return cache[key] # May be stale!
path = pathfind(start, goal)
cache[key] = path # Cache grows forever!
return path
# Building placed, but cached paths not invalidated
def place_building(x, y):
buildings.append(Building(x, y))
# BUG: Paths through this area still cached!
# GOOD: LRU + TTL + invalidation
cache = PathCache(max_size=5000, ttl_seconds=30.0)
def get_path(start, goal):
cached = cache.get(start, goal)
if cached:
return cached
path = pathfind(start, goal)
cache.put(start, goal, path)
return path
def place_building(x, y):
buildings.append(Building(x, y))
cache.invalidate_region(x, y, radius=100) # Clear affected paths
Solution:
- ✅ TTL (time-to-live) - expire entries after N seconds
- ✅ Event-based invalidation - clear cache when terrain changes
- ✅ LRU eviction - remove least recently used when full
- ✅ Bounded size - set max_size to prevent unbounded growth
Pitfall 5: Forgetting to Rebuild Spatial Grid
Symptoms:
- Units see enemies that are no longer there
- Collision detection misses fast-moving objects
- Query results are stale (from previous frame)
Why it fails:
- Entities move every frame, but grid not rebuilt
- Grid contains stale positions
Example:
# BAD: Grid built once, never updated
spatial_grid = SpatialHashGrid(cell_size=100)
for unit in units:
spatial_grid.insert(unit)
def game_loop():
# Units move
for unit in units:
unit.x += unit.vx * dt
unit.y += unit.vy * dt
# Query stale grid (positions from frame 0!)
enemies = spatial_grid.query_radius(player.x, player.y, 50)
# GOOD: Rebuild grid every frame
def game_loop():
# Move units
for unit in units:
unit.x += unit.vx * dt
unit.y += unit.vy * dt
# Rebuild spatial grid (fast: O(n))
spatial_grid.clear()
for unit in units:
spatial_grid.insert(unit)
# Query with current positions
enemies = spatial_grid.query_radius(player.x, player.y, 50)
Solution:
- ✅ Rebuild every frame - spatial_grid.clear() + insert all entities
- ✅ Or use dynamic structure - quadtree with update() method
- ✅ Profile rebuild cost - should be < 1ms for 1000 entities
Pitfall 6: Optimization Without Validation
Symptoms:
- "Optimized" code runs slower
- New bottleneck created elsewhere
- Unsure if optimization helped
Why it fails:
- No before/after measurements
- Optimization moved bottleneck to different system
- Assumptions about cost were wrong
Example:
# BAD: No measurement
def optimize_pathfinding():
# Made some changes...
# Hope it's faster?
pass
# GOOD: Measure before and after
def optimize_pathfinding():
# Measure baseline
start = time.perf_counter()
for i in range(100):
path = pathfind(start, goal)
baseline_ms = (time.perf_counter() - start) * 1000
print(f"Baseline: {baseline_ms:.2f}ms for 100 paths")
# Apply optimization...
add_path_caching()
# Measure improvement
start = time.perf_counter()
for i in range(100):
path = pathfind(start, goal)
optimized_ms = (time.perf_counter() - start) * 1000
print(f"Optimized: {optimized_ms:.2f}ms for 100 paths")
speedup = baseline_ms / optimized_ms
print(f"Speedup: {speedup:.1f}×")
# Baseline: 500ms for 100 paths
# Optimized: 200ms for 100 paths
# Speedup: 2.5×
Solution:
- ✅ Measure baseline before optimization
- ✅ Measure improvement after optimization
- ✅ Calculate speedup - validate it helped
- ✅ Re-profile - check for new bottlenecks
- ✅ Regression test - ensure gameplay still works
Pitfall 7: Ignoring Amdahl's Law (Diminishing Returns)
Concept: Speedup limited by serial portion of code
Amdahl's Law: Speedup = 1 / ((1 - P) + P/N)
- P = portion that can be parallelized (e.g., 0.75 = 75%)
- N = number of cores (e.g., 4)
Example:
- 75% of code parallelizable, 4 cores
- Speedup = 1 / ((1 - 0.75) + 0.75/4) = 1 / (0.25 + 0.1875) = 2.29×
- Not 4×! Serial portion limits speedup
Why it matters:
- Multithreading has diminishing returns
- Focus on parallelizing largest portions first
- Some tasks can't be parallelized (Amdahl's law ceiling)
Solution:
- ✅ Parallelize largest bottleneck first (maximize P)
- ✅ Set realistic expectations (2-3× on 4 cores, not 4×)
- ✅ Measure actual speedup - compare to theoretical maximum
Pitfall 8: Sorting Every Frame (Expensive!)
Symptoms:
- 3-5ms spent sorting units by distance
- Sorting is top function in profiler
Why it fails:
- O(n log n) sort is expensive for large N
- Entity distances change slowly (don't need exact sort every frame)
Example:
# BAD: Full sort every frame
def update():
# O(n log n) = 1000 × log(1000) ≈ 10,000 operations
units_sorted = sorted(units, key=lambda u: distance_to_camera(u))
# Update closest units
for unit in units_sorted[:100]:
unit.update()
# GOOD: Sort every N frames, or use approximate sort
def update():
# Re-sort every 10 frames only
if frame_count % 10 == 0:
global units_sorted
units_sorted = sorted(units, key=lambda u: distance_to_camera(u))
# Use slightly stale sort (good enough!)
for unit in units_sorted[:100]:
unit.update()
# BETTER: Use spatial partitioning (no sorting needed!)
def update():
# Query entities near camera (already sorted by distance)
nearby_units = spatial_grid.query_radius(camera.x, camera.y, radius=200)
# Update nearby units
for unit in nearby_units:
unit.update()
Solution:
- ✅ Sort less frequently - every 5-10 frames is fine
- ✅ Approximate sort - bucketing instead of exact sort
- ✅ Spatial queries - avoid sorting entirely (use grid/quadtree)
Real-World Examples
Example 1: Unity DOTS (Data-Oriented Technology Stack)
What it is: Unity's high-performance ECS (Entity Component System) architecture
Key optimizations:
Struct of Arrays (SoA) - Components stored in contiguous arrays
- Traditional:
List<GameObject>with components scattered in memory - DOTS:
NativeArray<Position>,NativeArray<Velocity>- cache-friendly - Result: 1.5-3× faster for batch operations
- Traditional:
Job System - Data parallelism across CPU cores
- Each job processes exclusive slice of entities
- No locks (data ownership model)
- Result: 2-4× speedup on 4-8 core CPUs
Burst Compiler - LLVM-based code generation
- Generates SIMD instructions (AVX2, SSE)
- Removes bounds checks, optimizes math
- Result: 2-10× faster than standard C#
Performance: 10,000 entities at 60 FPS (vs 1,000 in traditional Unity)
When to use:
- ✅ 1000+ entities needing updates
- ✅ Batch operations (position updates, physics, AI)
- ✅ Performance-critical simulations
When NOT to use:
- ❌ Small entity counts (< 100)
- ❌ Gameplay prototyping (ECS is complex)
- ❌ Unique entities with lots of one-off logic
Example 2: Supreme Commander (RTS with 1000+ Units)
Challenge: Support 1000+ units in RTS battles at 30-60 FPS
Optimizations:
Flow Fields for Pathfinding
- Pre-compute direction field from goal
- Each unit follows field (O(1) per unit)
- Alternative to A* per unit (O(n log n) each)
- Result: 100× faster pathfinding for groups
LOD for Unit AI
- LOD 0 (< 50 units from camera): Full behavior tree
- LOD 1 (50-100 units): Simplified FSM
- LOD 2 (100+ units): Scripted behavior
- Result: 3-5× fewer AI updates per frame
Spatial Partitioning for Weapons
- Grid-based broad-phase for weapon targeting
- Only check units in weapon range cells
- Result: O(n²) → O(n) for combat calculations
Time-Sliced Sim
- Economy updates: Every 10 frames
- Unit production: Every 5 frames
- Visual effects: Based on distance LOD
- Result: Consistent frame rate under load
Performance: 1000 units at 30 FPS, 500 units at 60 FPS
Lessons:
- Flow fields > A* for large unit groups
- LOD critical for maintaining frame rate at scale
- Spatial partitioning is non-negotiable for 1000+ units
Example 3: Total War (20,000+ Soldiers in Battles)
Challenge: Render and simulate 20,000 individual soldiers at 30-60 FPS
Optimizations:
Hierarchical LOD
- LOD 0 (< 20m): Full skeleton, detailed model
- LOD 1 (20-50m): Reduced skeleton, simpler model
- LOD 2 (50-100m): Impostor (textured quad)
- LOD 3 (100m+): Single pixel or culled
- Result: 10× fewer vertices rendered
Formation-Based AI
- Units in formation share single pathfinding result
- Individual units offset from formation center
- Result: 100× fewer pathfinding calculations
Batched Rendering
- Instanced rendering for identical soldiers
- 1 draw call for 100 soldiers (vs 100 draw calls)
- Result: 10× fewer draw calls
Simplified Physics
- Full physics for nearby units (< 20m)
- Ragdolls for deaths near camera
- Simplified collision for distant units
- Result: 5× fewer physics calculations
Performance: 20,000 units at 30-60 FPS (depending on settings)
Lessons:
- Visual LOD as important as simulation LOD
- Formation-based AI avoids redundant pathfinding
- Instanced rendering critical for large unit counts
Example 4: Cities Skylines (Traffic Simulation)
Challenge: Simulate 10,000+ vehicles with realistic traffic at 30 FPS
Optimizations:
Hierarchical Pathfinding
- Highway network → arterial roads → local streets
- Pre-compute high-level paths, refine locally
- Result: 20× faster pathfinding for long routes
Path Caching
- Common routes cached (home → work, work → home)
- 60-80% cache hit rate
- Result: 2.5-5× fewer pathfinding calculations
Dynamic Cost Adjustment
- Road segments track vehicle density
- Congested roads have higher pathfinding cost
- Vehicles reroute around congestion
- Result: Emergent traffic patterns
Despawn Distant Vehicles
- Vehicles > 500m from camera despawned
- Statistics tracked, respawn when relevant
- Result: Effective vehicle count reduced 50%
Performance: 10,000 active vehicles at 30 FPS
Lessons:
- Hierarchical pathfinding essential for city-scale maps
- Path caching provides huge wins (60%+ hit rate common)
- Despawning off-screen entities maintains performance
Example 5: Factorio (Mega-Factory Optimization)
Challenge: Simulate 100,000+ entities (belts, inserters, assemblers) at 60 FPS
Optimizations:
Update Skipping
- Idle machines don't update (no input/output)
- Active set typically 10-20% of total entities
- Result: 5-10× fewer updates per tick
Chunk-Based Simulation
- World divided into 32×32 tile chunks
- Inactive chunks (no player nearby) update less often
- Result: Effective world size reduced 80%
Belt Optimization
- Items on belts compressed into contiguous arrays
- Lane-based updates (not per-item)
- Result: 10× faster belt simulation
Electrical Network Caching
- Power grid solved once, cached until topology changes
- Only recalculate when grid modified
- Result: 100× fewer electrical calculations
Performance: 60 FPS with 100,000+ entities (in optimized factories)
Lessons:
- Update skipping (sleeping entities) provides huge wins
- Chunk-based simulation scales to massive worlds
- Cache static calculations (power grid, fluid networks)
Cross-References
Within Bravos/Simulation-Tactics
This skill applies to ALL other simulation skills:
- traffic-and-pathfinding ← Optimize pathfinding with caching, time-slicing
- ai-and-agent-simulation ← LOD for AI, time-sliced behavior trees
- physics-simulation-patterns ← Spatial partitioning for collision, broad-phase
- ecosystem-simulation ← LOD for distant populations, time-sliced updates
- weather-and-time ← Particle budgets, LOD for effects
- economic-simulation-patterns ← Time-slicing for economy updates
Related skills in this skillpack:
- spatial-partitioning (planned) - Deep dive into quadtrees, octrees, grids
- ecs-architecture (planned) - Data-oriented design, component systems
External Skillpacks
Yzmir/Performance-Optimization (if exists):
- Profiling tools and methodology
- Memory optimization (pooling, allocators)
- Cache optimization (data layouts)
Yzmir/Algorithms-and-Data-Structures (if exists):
- Spatial data structures (quadtree, k-d tree, BVH)
- Priority queues (for time-slicing)
- LRU cache implementation
Axiom/Game-Engine-Patterns (if exists):
- Update loop patterns
- Frame time management
- Object pooling
Testing Checklist
Use this checklist to verify optimization is complete and correct:
1. Profiling
- Captured baseline performance (frame time, FPS)
- Identified top 3-5 bottlenecks (80% of time)
- Understood WHY each bottleneck is slow (algorithm, data, cache)
- Documented baseline metrics for comparison
2. Algorithmic Optimization
- Checked for O(n²) algorithms (proximity queries, collisions)
- Applied spatial partitioning where appropriate (grid, quadtree)
- Validated spatial queries return correct results
- Measured improvement (should be 10-100×)
3. Level of Detail (LOD)
- Defined LOD levels (typically 4: full, reduced, minimal, culled)
- Implemented distance-based LOD assignment
- Added importance weighting (player units, selected units)
- Implemented hysteresis to prevent LOD thrashing
- Verified no visual popping artifacts
- Measured improvement (should be 2-10×)
4. Time-Slicing
- Set frame time budget per system (e.g., 2ms for pathfinding)
- Implemented priority queue (important work first)
- Verified budget is respected (doesn't exceed limit)
- Checked that high-priority work completes quickly
- Measured improvement (should be 2-5×)
5. Caching
- Identified redundant calculations to cache
- Implemented cache with LRU eviction
- Added TTL (time-to-live) expiration
- Implemented invalidation triggers (terrain changes, etc.)
- Verified cache hit rate (aim for 60-80%)
- Checked no stale data bugs (units walking through walls)
- Measured improvement (should be 2-10×)
6. Data-Oriented Design (if applicable)
- Identified batch operations on many entities
- Converted AoS → SoA for hot data
- Verified memory layout is cache-friendly
- Measured improvement (should be 1.5-3×)
7. Multithreading (if needed)
- Verified all simpler optimizations done first
- Identified embarrassingly parallel work
- Implemented job system or data parallelism
- Verified no race conditions (test with thread sanitizer)
- Checked performance gain justifies complexity
- Measured improvement (should be 1.5-4×)
8. Validation
- Met target frame rate (60 FPS or 30 FPS)
- Verified no gameplay regressions (units behave correctly)
- Checked no visual artifacts (LOD popping, etc.)
- Tested at target entity count (e.g., 1000 units)
- Tested edge cases (10,000 units, worst-case scenarios)
- Documented final performance metrics
- Calculated total speedup (baseline → optimized)
9. Before/After Comparison
| Metric | Before | After | Improvement |
|---|---|---|---|
| Frame Time | ___ms | ___ms | ___× faster |
| FPS | ___ | ___ | ___ |
| Bottleneck System Time | ___ms | ___ms | ___× faster |
| Entity Count (target FPS) | ___ | ___ | ___× more |
| Memory Usage | ___MB | ___MB | ___ |
10. Regression Tests
- Units still path correctly (no walking through walls)
- AI behavior unchanged (same decisions)
- Combat calculations correct (same damage)
- No crashes or exceptions
- No memory leaks (long play session test)
- Deterministic results (same input → same output)
Remember:
- Profile FIRST - measure before guessing
- Algorithmic optimization provides biggest wins (10-100×)
- LOD and time-slicing are essential for 1000+ entities
- Multithreading is LAST resort - complexity cost is high
- Validate improvement - measure before/after, check for regressions
Success criteria: Target frame rate achieved (60 FPS) with desired entity count (1000+) and no gameplay compromises.