| name | Python Performance |
| description | Master Python optimization techniques, profiling, memory management, and high-performance computing |
| version | 2.1.0 |
| sasmp_version | 1.3.0 |
| bonded_agent | 07-best-practices |
| bond_type | PRIMARY_BOND |
| retry_strategy | exponential_backoff |
| observability | [object Object] |
Python Performance Optimization
Overview
Master performance optimization in Python. Learn to profile code, identify bottlenecks, optimize algorithms, manage memory efficiently, and leverage high-performance libraries for compute-intensive tasks.
Learning Objectives
- Profile Python code to identify bottlenecks
- Optimize algorithms and data structures
- Manage memory efficiently
- Use compiled extensions (Cython, NumPy)
- Implement caching strategies
- Parallelize CPU-bound operations
- Benchmark and measure improvements
Core Topics
1. Profiling & Benchmarking
- timeit module for micro-benchmarks
- cProfile for function-level profiling
- line_profiler for line-by-line analysis
- memory_profiler for memory usage
- py-spy for production profiling
- Flame graphs and visualization
Code Example:
import timeit
import cProfile
import pstats
# 1. timeit for micro-benchmarks
def list_comprehension():
return [x**2 for x in range(1000)]
def map_function():
return list(map(lambda x: x**2, range(1000)))
# Compare performance
time_lc = timeit.timeit(list_comprehension, number=10000)
time_map = timeit.timeit(map_function, number=10000)
print(f"List comprehension: {time_lc:.4f}s")
print(f"Map function: {time_map:.4f}s")
# 2. cProfile for function profiling
def process_data():
data = []
for i in range(100000):
data.append(i ** 2)
return sum(data)
profiler = cProfile.Profile()
profiler.enable()
result = process_data()
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)
# 3. Line profiling (requires line_profiler package)
# @profile decorator (add manually for line_profiler)
def slow_function():
total = 0
for i in range(1000000):
total += i ** 2
return total
# Run with: kernprof -l -v script.py
# 4. Memory profiling
from memory_profiler import profile
@profile
def memory_intensive():
large_list = [i for i in range(1000000)]
large_dict = {i: i**2 for i in range(1000000)}
return len(large_list) + len(large_dict)
# Run with: python -m memory_profiler script.py
2. Algorithm & Data Structure Optimization
- Choosing efficient data structures
- Time complexity analysis
- Generator expressions vs lists
- Set operations for lookups
- Deque for queue operations
- Bisect for sorted lists
Code Example:
import bisect
from collections import deque, Counter, defaultdict
import time
# 1. List vs Set for membership testing
# Bad: O(n) lookup
def find_in_list(items, target):
return target in items # Linear search
# Good: O(1) lookup
def find_in_set(items, target):
items_set = set(items)
return target in items_set
items = list(range(100000))
# List: 0.001s, Set: 0.000001s (1000x faster!)
# 2. Generator expressions for memory efficiency
# Bad: Creates entire list in memory
squares_list = [x**2 for x in range(1000000)] # ~4MB
# Good: Generates on-demand
squares_gen = (x**2 for x in range(1000000)) # ~128 bytes
# 3. Deque for efficient queue operations
# Bad: O(n) pop from beginning
queue_list = list(range(10000))
queue_list.pop(0) # Slow
# Good: O(1) pop from both ends
queue_deque = deque(range(10000))
queue_deque.popleft() # Fast
# 4. Bisect for maintaining sorted lists
# Bad: O(n) insertion into sorted list
sorted_list = []
for i in [5, 2, 8, 1, 9]:
sorted_list.append(i)
sorted_list.sort()
# Good: O(log n) insertion
sorted_list = []
for i in [5, 2, 8, 1, 9]:
bisect.insort(sorted_list, i)
# 5. Counter for frequency counting
# Bad: Manual counting
word_count = {}
for word in words:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
# Good: Counter
word_count = Counter(words)
most_common = word_count.most_common(10)
3. Memory Management
- Memory allocation and garbage collection
- Object pooling
- Slots for memory-efficient classes
- Reference counting
- Weak references
- Memory leaks detection
Code Example:
import gc
import sys
from weakref import WeakValueDictionary
# 1. __slots__ for memory-efficient classes
# Bad: Regular class (56 bytes per instance)
class RegularPoint:
def __init__(self, x, y):
self.x = x
self.y = y
# Good: Slots class (32 bytes per instance - 43% smaller!)
class SlottedPoint:
__slots__ = ['x', 'y']
def __init__(self, x, y):
self.x = x
self.y = y
print(sys.getsizeof(RegularPoint(1, 2))) # 56 bytes
print(sys.getsizeof(SlottedPoint(1, 2))) # 32 bytes
# 2. Object pooling for expensive objects
class ObjectPool:
def __init__(self, factory, max_size=10):
self.factory = factory
self.max_size = max_size
self.pool = []
def acquire(self):
if self.pool:
return self.pool.pop()
return self.factory()
def release(self, obj):
if len(self.pool) < self.max_size:
self.pool.append(obj)
# Usage
db_pool = ObjectPool(lambda: DatabaseConnection(), max_size=5)
conn = db_pool.acquire()
# Use connection
db_pool.release(conn)
# 3. Weak references to prevent memory leaks
class Cache:
def __init__(self):
self._cache = WeakValueDictionary()
def get(self, key):
return self._cache.get(key)
def set(self, key, value):
self._cache[key] = value
# 4. Manual garbage collection for large operations
def process_large_dataset():
for batch in large_data:
process_batch(batch)
# Force garbage collection after each batch
gc.collect()
# 5. Context managers for resource cleanup
class ManagedResource:
def __enter__(self):
self.resource = allocate_resource()
return self.resource
def __exit__(self, exc_type, exc_val, exc_tb):
self.resource.cleanup()
return False
4. High-Performance Computing
- NumPy vectorization
- Numba JIT compilation
- Cython for C extensions
- Multiprocessing for parallelism
- Concurrent.futures
- Performance comparison
Code Example:
import numpy as np
from numba import jit
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor
# 1. NumPy vectorization
# Bad: Python loops (slow)
def python_sum(n):
total = 0
for i in range(n):
total += i ** 2
return total
# Good: NumPy vectorization (100x faster!)
def numpy_sum(n):
arr = np.arange(n)
return np.sum(arr ** 2)
# Benchmark: python_sum(1000000) = 0.15s
# numpy_sum(1000000) = 0.002s
# 2. Numba JIT compilation
@jit(nopython=True) # Compile to machine code
def fast_function(n):
total = 0
for i in range(n):
total += i ** 2
return total
# First call: compilation + execution
# Subsequent calls: 50x faster than pure Python!
# 3. Multiprocessing for CPU-bound tasks
def cpu_intensive_task(n):
return sum(i * i for i in range(n))
# Single process
result = cpu_intensive_task(10000000)
# Multiple processes
with ProcessPoolExecutor(max_workers=4) as executor:
ranges = [2500000, 2500000, 2500000, 2500000]
results = executor.map(cpu_intensive_task, ranges)
total = sum(results)
# 4x speedup on 4 cores!
# 4. Caching for expensive computations
from functools import lru_cache
@lru_cache(maxsize=128)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)
# fibonacci(100) without cache: ~forever
# fibonacci(100) with cache: instant
# 5. Memory views for zero-copy operations
def process_array(data):
# Bad: Creates copy
subset = data[1000:2000]
# Good: Zero-copy view
view = memoryview(data)[1000:2000]
Hands-On Practice
Project 1: Performance Profiler
Build a comprehensive profiling tool.
Requirements:
- CPU profiling with cProfile
- Memory profiling
- Line-by-line analysis
- Visualization (flame graphs)
- HTML report generation
- Bottleneck identification
Key Skills: Profiling tools, visualization, analysis
Project 2: Data Processing Pipeline
Optimize data processing pipeline.
Requirements:
- Load large CSV files (1GB+)
- Transform and clean data
- Aggregate statistics
- Compare Python/NumPy/Pandas approaches
- Measure memory usage
- Optimize to <2GB RAM
Key Skills: NumPy, memory optimization, benchmarking
Project 3: Parallel Computing
Implement parallel algorithms.
Requirements:
- Matrix multiplication
- Image processing
- Monte Carlo simulation
- Compare threading/multiprocessing/asyncio
- Measure speedup
- Handle shared state
Key Skills: Parallelism, performance measurement
Assessment Criteria
- Profile code to identify bottlenecks
- Choose appropriate data structures
- Optimize algorithms for time complexity
- Manage memory efficiently
- Use vectorization where applicable
- Implement effective caching
- Parallelize CPU-bound operations
Resources
Official Documentation
- Python Performance Tips - Official tips
- NumPy Docs - NumPy documentation
- Numba Docs - JIT compilation
Learning Platforms
- High Performance Python - O'Reilly book
- Python Performance - Real Python guide
- Optimizing Python - PyCon talks
Tools
- cProfile - CPU profiling
- memory_profiler - Memory profiling
- py-spy - Sampling profiler
- Scalene - CPU/GPU/memory profiler
Next Steps
After mastering Python performance, explore:
- Cython - C extensions for Python
- PyPy - Alternative Python interpreter
- Dask - Parallel computing library
- CUDA - GPU programming with Python