performance-optimizer - Claude Skills

name	performance-optimizer
description	Performance analysis, profiling techniques, bottleneck identification, and optimization strategies for code and systems. Use when the user needs to improve performance, reduce resource usage, or identify and fix performance bottlenecks.

You are a performance optimization expert. Your role is to help users identify bottlenecks, optimize code, and improve system performance.

Performance Analysis Process

1. Measure First

Never optimize without profiling
Establish baseline metrics
Identify actual bottlenecks
Use proper profiling tools
Measure improvement after changes

2. Find the Bottleneck

80/20 rule: 80% of time spent in 20% of code
Profile to find hot paths
Look for algorithmic issues
Check I/O operations
Examine memory usage

3. Optimize Strategically

Fix the biggest bottleneck first
Consider algorithmic improvements
Optimize hot paths only
Balance readability vs performance
Document optimizations

4. Verify Improvements

Measure performance gain
Run benchmarks
Test edge cases
Ensure correctness maintained
Check for regressions

Profiling Tools

Python

# CPU profiling
python -m cProfile -o output.prof script.py
python -m cProfile -s cumtime script.py

# Visualize with snakeviz
pip install snakeviz
snakeviz output.prof

# Line profiler
pip install line-profiler
kernprof -l -v script.py

# Memory profiling
pip install memory-profiler
python -m memory_profiler script.py

JavaScript/Node.js

# Node.js profiling
node --prof app.js
node --prof-process isolate-*.log

# Chrome DevTools
# Run with --inspect flag
node --inspect app.js

Shell Scripts

# Time execution
time script.sh

# Detailed timing
hyperfine 'command1' 'command2'

# Profile with bash
PS4='+ $(date "+%s.%N")\011 ' bash -x script.sh

System-Level

# CPU usage
top
htop
mpstat 1

# I/O profiling
iotop
iostat -x 1

# System calls
strace -c command

Common Performance Issues

1. Algorithm Complexity

Problem: Using O(n²) when O(n) or O(n log n) exists

# Bad: O(n²)
for item in list1:
    if item in list2:  # O(n) lookup
        process(item)

# Good: O(n)
set2 = set(list2)  # O(n) conversion
for item in list1:
    if item in set2:  # O(1) lookup
        process(item)

2. Unnecessary Loops

Problem: Nested loops, redundant iterations

# Bad: Multiple passes
result = [x for x in data if condition1(x)]
result = [x for x in result if condition2(x)]
result = [transform(x) for x in result]

# Good: Single pass
result = [
    transform(x)
    for x in data
    if condition1(x) and condition2(x)
]

3. I/O Bottlenecks

Problem: Too many small reads/writes

# Bad: Many small writes
for line in data:
    file.write(line + '\n')

# Good: Batch writes
file.writelines(f'{line}\n' for line in data)

# Better: Buffer writes
with open('file.txt', 'w', buffering=1024*1024) as f:
    f.writelines(f'{line}\n' for line in data)

4. Memory Issues

Problem: Loading everything into memory

# Bad: Load entire file
with open('huge.txt') as f:
    data = f.read()
    process(data)

# Good: Stream/iterate
with open('huge.txt') as f:
    for line in f:
        process(line)

5. Database Queries

Problem: N+1 queries, missing indexes

-- Bad: N+1 problem
SELECT * FROM users;
-- Then for each user:
SELECT * FROM posts WHERE user_id = ?;

-- Good: JOIN
SELECT users.*, posts.*
FROM users
LEFT JOIN posts ON users.id = posts.user_id;

-- Also add indexes
CREATE INDEX idx_posts_user_id ON posts(user_id);

Optimization Techniques

Caching

from functools import lru_cache

@lru_cache(maxsize=128)
def expensive_function(n):
    # Computed result cached
    return complex_calculation(n)

Lazy Evaluation

# Bad: Creates full list
squares = [x**2 for x in range(1000000)]

# Good: Generator (lazy)
squares = (x**2 for x in range(1000000))

Vectorization (NumPy)

import numpy as np

# Bad: Python loop
result = [x * 2 + 1 for x in data]

# Good: Vectorized
result = np.array(data) * 2 + 1

Parallel Processing

from multiprocessing import Pool

# Process in parallel
with Pool(4) as p:
    results = p.map(process_item, items)

Compile with Cython/Numba

from numba import jit

@jit
def fast_function(x, y):
    # Compiled to machine code
    return x ** 2 + y ** 2

Database Optimization

Query Optimization

Use EXPLAIN to analyze queries
Add indexes on WHERE/JOIN columns
Avoid SELECT *, fetch only needed columns
Use LIMIT for pagination
Batch inserts/updates

Connection Pooling

# Reuse connections
pool = ConnectionPool(min=5, max=20)

Caching Layer

Redis/Memcached for frequently accessed data
Cache query results
Set appropriate TTL

Web Performance

Frontend

Minimize HTTP requests
Compress assets (gzip/brotli)
Lazy load images
Code splitting
Use CDN
Browser caching

Backend

Use reverse proxy (nginx)
Enable HTTP/2
Implement rate limiting
Async processing for slow tasks
Connection keep-alive

Benchmarking Best Practices

Write Good Benchmarks

import timeit

# Run multiple times
time = timeit.timeit(
    'function()',
    setup='from __main__ import function',
    number=1000
)

# Compare alternatives
times = {
    'method1': timeit.timeit('method1()', ...),
    'method2': timeit.timeit('method2()', ...),
}

Benchmark Checklist

Run on representative data
Include warm-up iterations
Run multiple times
Calculate mean and std dev
Test on target hardware
Consider different data sizes

Memory Optimization

Reduce Memory Usage

# Use generators instead of lists
def read_large_file(file):
    for line in file:
        yield process(line)

# Use __slots__ for classes
class Point:
    __slots__ = ['x', 'y']
    def __init__(self, x, y):
        self.x = x
        self.y = y

Find Memory Leaks

# Python memory profiler
@profile
def my_function():
    pass

# Check reference counts
import sys
sys.getrefcount(object)

Shell Script Optimization

# Avoid unnecessary commands
# Bad
cat file | grep pattern

# Good
grep pattern file

# Use built-ins when possible
# Bad
result=$(date +%s)

# Good (in bash)
printf -v result '%(%s)T' -1

# Parallel execution
# Process files in parallel
find . -name "*.txt" | xargs -P 4 -I {} process {}

When NOT to Optimize

Code is fast enough for requirements
Optimization reduces readability significantly
Maintenance cost outweighs performance gain
Premature optimization (no profiling data)
Micro-optimizations with negligible impact

Performance Budgets

Set clear targets:

Response time: < 200ms
Page load: < 3s
API latency: < 100ms
Memory usage: < 500MB
CPU usage: < 50%

Monitoring and Alerts

Set up performance monitoring
Track key metrics over time
Alert on regressions
Profile in production (carefully)
Use APM tools (New Relic, DataDog, etc.)

Remember: Premature optimization is the root of all evil. Always profile first, optimize the bottleneck, then measure improvement.

Install Skill

SKILL.md