Claude Code Plugins

Community-maintained marketplace

Feedback

debugging-and-profiling

@tachyon-beep/skillpacks
1
0

Debugging fundamentals, debugpy/VS Code, pdb, CPU profiling, memory profiling, profiling async code, performance optimization, systematic diagnosis, common bottlenecks

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name debugging-and-profiling
description Debugging fundamentals, debugpy/VS Code, pdb, CPU profiling, memory profiling, profiling async code, performance optimization, systematic diagnosis, common bottlenecks

Debugging and Profiling

Overview

Core Principle: Profile before optimizing. Humans are terrible at guessing where code is slow. Always measure before making changes.

Python debugging and profiling enables systematic problem diagnosis and performance optimization. Use debugpy/pdb for step-through debugging, cProfile for CPU profiling, memory_profiler for memory analysis. The biggest mistake: optimizing code without profiling first—you'll likely optimize the wrong thing.

When to Use

Use this skill when:

  • "Code is slow"
  • "How to profile Python?"
  • "Memory leak"
  • "Debugging not working"
  • "Find bottleneck"
  • "Optimize performance"
  • "Step through code"
  • "Where is my code spending time?"

Don't use when:

  • Setting up project (use project-structure-and-tooling)
  • Already know what to optimize (but still profile to verify!)
  • Algorithm selection (different skill domain)

Symptoms triggering this skill:

  • Code runs slower than expected
  • Memory usage growing over time
  • Need to understand execution flow
  • Performance degraded after changes

Debugging Fundamentals

Using debugpy with VS Code

# ✅ CORRECT: debugpy for remote debugging
import debugpy

# Allow VS Code to attach
debugpy.listen(5678)
print("Waiting for debugger to attach...")
debugpy.wait_for_client()

# Your code here
def process_data(data):
    result = []
    for item in data:
        # Set breakpoint in VS Code on this line
        transformed = transform(item)
        result.append(transformed)
    return result

# VS Code launch.json configuration:
"""
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Attach",
            "type": "python",
            "request": "attach",
            "connect": {
                "host": "localhost",
                "port": 5678
            }
        }
    ]
}
"""

Using pdb (Python Debugger)

# ✅ CORRECT: pdb for interactive debugging
import pdb

def buggy_function(data):
    result = []
    for i, item in enumerate(data):
        # Drop into debugger
        pdb.set_trace()  # Or: breakpoint() in Python 3.7+

        processed = item * 2
        result.append(processed)
    return result

# pdb commands:
# n (next): Execute next line
# s (step): Step into function
# c (continue): Continue execution
# p variable: Print variable
# pp variable: Pretty print variable
# l (list): Show current location in code
# w (where): Show stack trace
# q (quit): Quit debugger

Conditional Breakpoints

# ❌ WRONG: Breaking on every iteration
def process_items(items):
    for item in items:
        pdb.set_trace()  # Breaks 10000 times!
        process(item)

# ✅ CORRECT: Conditional breakpoint
def process_items(items):
    for i, item in enumerate(items):
        if i == 5000:  # Only break on specific iteration
            breakpoint()
        process(item)

# ✅ BETTER: Use pdb.set_trace with condition
def process_items(items):
    for item in items:
        if item.value < 0:  # Break only when problematic
            breakpoint()
        process(item)

Post-Mortem Debugging

# ✅ CORRECT: Debug after exception
import pdb

def main():
    try:
        # Code that might raise exception
        result = risky_operation()
    except Exception:
        # Drop into debugger at exception point
        pdb.post_mortem()

# ✅ CORRECT: Auto post-mortem for unhandled exceptions
import sys

def custom_excepthook(type, value, traceback):
    pdb.post_mortem(traceback)

sys.excepthook = custom_excepthook

# Now unhandled exceptions drop into pdb automatically

Why this matters: Breakpoints let you inspect state at exact point of failure. Conditional breakpoints avoid noise. Post-mortem debugging examines crashes.


CPU Profiling

cProfile for Function-Level Profiling

import cProfile
import pstats

# ❌ WRONG: Guessing which function is slow
def slow_program():
    # "I think this loop is the problem..."
    for i in range(1000):
        process_data(i)

# ✅ CORRECT: Profile to find actual bottleneck
def slow_program():
    for i in range(1000):
        process_data(i)

# Profile the function
cProfile.run('slow_program()', 'profile_stats')

# Analyze results
stats = pstats.Stats('profile_stats')
stats.strip_dirs()
stats.sort_stats('cumulative')
stats.print_stats(20)  # Top 20 functions by cumulative time

# ✅ CORRECT: Profile with context manager
from contextlib import contextmanager
import cProfile

@contextmanager
def profiled():
    pr = cProfile.Profile()
    pr.enable()
    yield
    pr.disable()

    stats = pstats.Stats(pr)
    stats.strip_dirs()
    stats.sort_stats('cumulative')
    stats.print_stats(20)

# Usage
with profiled():
    slow_program()

Profiling Specific Code Blocks

# ✅ CORRECT: Profile specific section
import cProfile

pr = cProfile.Profile()

# Normal code
setup_data()

# Profile this section
pr.enable()
expensive_operation()
pr.disable()

# More normal code
cleanup()

# View results
pr.print_stats(sort='cumulative')

Line-Level Profiling with line_profiler

# Install: pip install line_profiler

# ✅ CORRECT: Line-by-line profiling
from line_profiler import LineProfiler

@profile  # Use @profile decorator
def slow_function():
    total = 0
    for i in range(10000):
        total += i ** 2
    return total

# Run with kernprof:
# kernprof -l -v script.py

# Or programmatically:
lp = LineProfiler()
lp.add_function(slow_function)
lp.enable()
slow_function()
lp.disable()
lp.print_stats()

# Output shows time spent per line:
# Line #      Hits         Time  Per Hit   % Time  Line Contents
# ==============================================================
#     1                                           def slow_function():
#     2         1          2.0      2.0      0.0      total = 0
#     3     10001      15234.0      1.5     20.0      for i in range(10000):
#     4     10000      60123.0      6.0     80.0          total += i ** 2
#     5         1          1.0      1.0      0.0      return total

Why this matters: cProfile shows which functions are slow. line_profiler shows which lines within functions. Both essential for optimization.

Visualizing Profiles with SnakeViz

# Install: pip install snakeviz

# Profile code
python -m cProfile -o program.prof script.py

# Visualize
snakeviz program.prof

# Opens browser with interactive visualization:
# - Sunburst chart showing call hierarchy
# - Icicle chart showing time distribution
# - Click functions to zoom in

Memory Profiling

Memory Usage with memory_profiler

# Install: pip install memory_profiler

from memory_profiler import profile

# ✅ CORRECT: Track memory usage per line
@profile
def memory_hungry_function():
    # Line-by-line memory usage shown
    big_list = [i for i in range(1000000)]  # Allocates ~40MB
    big_dict = {i: i**2 for i in range(1000000)}  # Another ~40MB
    return len(big_list), len(big_dict)

# Run with:
# python -m memory_profiler script.py

# Output:
# Line #    Mem usage    Increment   Line Contents
# ================================================
#      3   38.3 MiB     38.3 MiB   @profile
#      4                             def memory_hungry_function():
#      5   45.2 MiB      6.9 MiB       big_list = [i for i in range(1000000)]
#      6   83.1 MiB     37.9 MiB       big_dict = {i: i**2 for i in range(1000000)}
#      7   83.1 MiB      0.0 MiB       return len(big_list), len(big_dict)

Finding Memory Leaks

# ✅ CORRECT: Detect memory leaks with tracemalloc
import tracemalloc

# Start tracing
tracemalloc.start()

# Take snapshot before
snapshot1 = tracemalloc.take_snapshot()

# Run code that might leak
problematic_function()

# Take snapshot after
snapshot2 = tracemalloc.take_snapshot()

# Compare snapshots
top_stats = snapshot2.compare_to(snapshot1, 'lineno')

print("Top 10 memory increases:")
for stat in top_stats[:10]:
    print(stat)

tracemalloc.stop()

# ✅ CORRECT: Track specific objects
import gc
import sys

def find_memory_leak():
    # Force garbage collection
    gc.collect()

    # Track objects before
    before = len(gc.get_objects())

    # Run potentially leaky code
    for _ in range(100):
        leaky_operation()

    # Force GC again
    gc.collect()

    # Track objects after
    after = len(gc.get_objects())

    if after > before:
        print(f"Potential leak: {after - before} objects not collected")

        # Find what's keeping objects alive
        for obj in gc.get_objects():
            if isinstance(obj, MyClass):  # Suspect class
                print(f"Found {type(obj)}: {sys.getrefcount(obj)} references")
                print(gc.get_referrers(obj))

Profiling Memory with objgraph

# Install: pip install objgraph

import objgraph

# ✅ CORRECT: Find most common objects
def analyze_memory():
    objgraph.show_most_common_types()
    # Output:
    # dict                   12453
    # function               8234
    # list                   6789
    # ...

# ✅ CORRECT: Track object growth
objgraph.show_growth()
potentially_leaky_function()
objgraph.show_growth()  # Shows objects that increased

# ✅ CORRECT: Visualize object references
import objgraph
objgraph.show_refs([my_object], filename='refs.png')
# Creates graph showing what references my_object

Why this matters: Memory leaks cause gradual performance degradation. tracemalloc and memory_profiler help find exactly where memory is allocated.


Profiling Async Code

Profiling Async Functions

import asyncio
import cProfile
import pstats

# ❌ WRONG: cProfile doesn't work well with async
async def slow_async():
    await asyncio.sleep(1)
    await process_data()

cProfile.run('asyncio.run(slow_async())')  # Misleading results

# ✅ CORRECT: Use yappi for async profiling
# Install: pip install yappi
import yappi

async def slow_async():
    await asyncio.sleep(1)
    await process_data()

yappi.set_clock_type("wall")  # Use wall time, not CPU time
yappi.start()

asyncio.run(slow_async())

yappi.stop()

# Print stats
stats = yappi.get_func_stats()
stats.sort("totaltime", "desc")
stats.print_all()

# ✅ CORRECT: Profile coroutines specifically
stats = yappi.get_func_stats(filter_callback=lambda x: 'coroutine' in x.name)
stats.print_all()

Detecting Blocking Code in Async

# ✅ CORRECT: Detect event loop blocking
import asyncio
import time

class LoopMonitor:
    def __init__(self, threshold: float = 0.1):
        self.threshold = threshold

    async def monitor(self):
        while True:
            start = time.monotonic()
            await asyncio.sleep(0.01)  # Very short sleep
            elapsed = time.monotonic() - start

            if elapsed > self.threshold:
                print(f"WARNING: Event loop blocked for {elapsed:.3f}s")

async def main():
    # Start monitor
    monitor = LoopMonitor(threshold=0.1)
    monitor_task = asyncio.create_task(monitor.monitor())

    # Run your async code
    await your_async_function()

    monitor_task.cancel()

# ✅ CORRECT: Use asyncio debug mode
asyncio.run(main(), debug=True)
# Warns about slow callbacks (>100ms)

Performance Optimization Strategies

Optimization Workflow

# ✅ CORRECT: Systematic optimization approach

# 1. Profile to find bottleneck
import cProfile
cProfile.run('main()', 'profile_stats')

# 2. Analyze results
stats = pstats.Stats('profile_stats')
stats.sort_stats('cumulative')
stats.print_stats(10)  # Focus on top 10

# 3. Identify specific slow function
def slow_function(data):
    # Original implementation
    result = []
    for item in data:
        if is_valid(item):
            result.append(transform(item))
    return result

# 4. Create benchmark
import timeit

def benchmark():
    data = create_test_data(10000)
    time_taken = timeit.timeit(
        lambda: slow_function(data),
        number=100
    )
    print(f"Average time: {time_taken / 100:.4f}s")

benchmark()  # Baseline: 0.1234s

# 5. Optimize
def optimized_function(data):
    # Use list comprehension (faster)
    return [transform(item) for item in data if is_valid(item)]

# 6. Benchmark again
time_taken = timeit.timeit(
    lambda: optimized_function(data),
    number=100
)
print(f"Average time: {time_taken / 100:.4f}s")  # 0.0789s - 36% faster!

# 7. Verify correctness
assert slow_function(data) == optimized_function(data)

# 8. Re-profile entire program to verify improvement
cProfile.run('main()', 'profile_stats_optimized')

Why this matters: Without profiling, you might optimize code that takes 1% of runtime, ignoring the 90% bottleneck. Always measure.

Common Optimizations

# ❌ WRONG: Repeated expensive operations
def process_items(items):
    for item in items:
        # Regex compiled every iteration!
        pattern = re.compile(r'\d+')
        match = pattern.search(item)

# ✅ CORRECT: Move expensive operations outside loop
def process_items(items):
    pattern = re.compile(r'\d+')  # Compile once
    for item in items:
        match = pattern.search(item)

# ❌ WRONG: Growing list with repeated concatenation
def build_large_list():
    result = []
    for i in range(100000):
        result = result + [i]  # Creates new list each time! O(n²)

# ✅ CORRECT: Use append
def build_large_list():
    result = []
    for i in range(100000):
        result.append(i)  # O(n)

# ❌ WRONG: Checking membership in list
def filter_items(items, blacklist):
    return [item for item in items if item not in blacklist]
    # O(n * m) if blacklist is list

# ✅ CORRECT: Use set for membership checks
def filter_items(items, blacklist):
    blacklist_set = set(blacklist)  # O(m)
    return [item for item in items if item not in blacklist_set]
    # O(n) for iteration + O(1) per lookup = O(n)

Caching Results

from functools import lru_cache

# ❌ WRONG: Recomputing expensive results
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
# O(2^n) - recalculates same values repeatedly

# ✅ CORRECT: Cache results
@lru_cache(maxsize=None)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
# O(n) - each value computed once

# ✅ CORRECT: Custom caching for unhashable arguments
from functools import wraps

def cache_dataframe_results(func):
    cache = {}

    @wraps(func)
    def wrapper(df):
        # Use hash of dataframe content as key
        key = hashlib.md5(df.to_csv(index=False).encode()).hexdigest()

        if key not in cache:
            cache[key] = func(df)

        return cache[key]

    return wrapper

@cache_dataframe_results
def expensive_dataframe_operation(df):
    # Complex computation
    return df.groupby('category').agg({'value': 'sum'})

Systematic Diagnosis

Performance Degradation Diagnosis

# ✅ CORRECT: Diagnose performance regression
import cProfile
import pstats

def diagnose_slowdown():
    """Compare current vs baseline performance."""

    # Profile current code
    cProfile.run('main()', 'current_profile.prof')

    # Load baseline profile (from git history or previous run)
    # git show main:profile.prof > baseline_profile.prof

    current = pstats.Stats('current_profile.prof')
    baseline = pstats.Stats('baseline_profile.prof')

    print("=== CURRENT ===")
    current.sort_stats('cumulative')
    current.print_stats(10)

    print("\n=== BASELINE ===")
    baseline.sort_stats('cumulative')
    baseline.print_stats(10)

    # Look for functions that got slower
    # Compare cumulative times

Memory Leak Diagnosis

# ✅ CORRECT: Systematic memory leak detection
import tracemalloc
import gc

def diagnose_memory_leak():
    """Run function multiple times and check memory growth."""

    gc.collect()
    tracemalloc.start()

    # Baseline
    snapshot1 = tracemalloc.take_snapshot()

    # Run 100 times
    for _ in range(100):
        potentially_leaky_function()
        gc.collect()

    # Check memory
    snapshot2 = tracemalloc.take_snapshot()

    top_stats = snapshot2.compare_to(snapshot1, 'lineno')

    print("Top 10 memory allocations:")
    for stat in top_stats[:10]:
        print(f"{stat.traceback}: +{stat.size_diff / 1024:.1f} KB")

    tracemalloc.stop()

I/O vs CPU Bound Diagnosis

# ✅ CORRECT: Determine if I/O or CPU bound
import time
import cProfile

def diagnose_bottleneck():
    """Determine if program is I/O or CPU bound."""

    # Time wall clock
    start_wall = time.time()
    main()
    wall_time = time.time() - start_wall

    # Profile CPU time
    pr = cProfile.Profile()
    pr.enable()
    start_cpu = time.process_time()
    main()
    cpu_time = time.process_time() - start_cpu
    pr.disable()

    print(f"Wall time: {wall_time:.2f}s")
    print(f"CPU time: {cpu_time:.2f}s")

    if cpu_time / wall_time > 0.9:
        print("CPU bound - optimize computation")
        # Consider: Cython, NumPy, multiprocessing
    else:
        print("I/O bound - optimize I/O")
        # Consider: async/await, caching, batching

Common Bottlenecks and Solutions

String Concatenation

# ❌ WRONG: String concatenation in loop
def build_string(items):
    result = ""
    for item in items:
        result += str(item) + "\n"  # Creates new string each time
    return result
# O(n²) time complexity

# ✅ CORRECT: Use join
def build_string(items):
    return "\n".join(str(item) for item in items)
# O(n) time complexity

# Benchmark:
# 1000 items: 0.0015s (join) vs 0.0234s (concatenation) - 15x faster
# 10000 items: 0.015s (join) vs 2.341s (concatenation) - 156x faster

List Comprehension vs Map/Filter

import timeit

# ✅ CORRECT: List comprehension (usually fastest)
def with_list_comp(data):
    return [x * 2 for x in data if x > 0]

# ✅ CORRECT: Generator (memory efficient for large data)
def with_generator(data):
    return (x * 2 for x in data if x > 0)

# Map/filter (sometimes faster for simple operations)
def with_map_filter(data):
    return map(lambda x: x * 2, filter(lambda x: x > 0, data))

# Benchmark
data = list(range(1000000))
print(timeit.timeit(lambda: list(with_list_comp(data)), number=10))
print(timeit.timeit(lambda: list(with_generator(data)), number=10))
print(timeit.timeit(lambda: list(with_map_filter(data)), number=10))

# Results: List comprehension usually fastest for complex logic
# Generator best when you don't need all results at once

Dictionary Lookups vs List Searches

# ❌ WRONG: Searching in list
def find_users_list(user_ids, all_users_list):
    results = []
    for user_id in user_ids:
        for user in all_users_list:  # O(n) per lookup
            if user['id'] == user_id:
                results.append(user)
                break
    return results
# O(n * m) time complexity

# ✅ CORRECT: Use dictionary
def find_users_dict(user_ids, all_users_dict):
    return [all_users_dict[uid] for uid in user_ids if uid in all_users_dict]
# O(n) time complexity

# Benchmark:
# 1000 lookups in 10000 items:
# List: 1.234s
# Dict: 0.001s - 1234x faster!

DataFrame Iteration Anti-Pattern

import pandas as pd
import numpy as np

# ❌ WRONG: Iterating over DataFrame rows
def process_rows_iterrows(df):
    results = []
    for idx, row in df.iterrows():  # VERY SLOW
        if row['value'] > 0:
            results.append(row['value'] * 2)
    return results

# ✅ CORRECT: Vectorized operations
def process_rows_vectorized(df):
    mask = df['value'] > 0
    return (df.loc[mask, 'value'] * 2).tolist()

# Benchmark with 100,000 rows:
# iterrows: 15.234s
# vectorized: 0.015s - 1000x faster!

Profiling Tools Comparison

When to Use Which Tool

Tool Use Case Output
cProfile Function-level CPU profiling Which functions take most time
line_profiler Line-level CPU profiling Which lines within function slow
memory_profiler Line-level memory profiling Memory usage per line
tracemalloc Memory allocation tracking Where memory allocated
yappi Async/multithreaded profiling Profile concurrent code
py-spy Sampling profiler (no code changes) Profile running processes
scalene CPU+GPU+memory profiling Comprehensive profiling

py-spy for Production Profiling

# Install: pip install py-spy

# Profile running process (no code changes needed!)
py-spy record -o profile.svg --pid 12345

# Profile for 60 seconds
py-spy record -o profile.svg --duration 60 -- python script.py

# Top-like view of running process
py-spy top --pid 12345

# Why use py-spy:
# - No code changes needed
# - Minimal overhead
# - Can attach to running process
# - Great for production debugging

Anti-Patterns

Premature Optimization

# ❌ WRONG: Optimizing before measuring
def process_data(data):
    # "Let me make this fast with complex caching..."
    # Spend hours optimizing function that takes 0.1% of runtime

# ✅ CORRECT: Profile first
cProfile.run('main()', 'profile.prof')
# Oh, process_data only takes 0.1% of time
# The real bottleneck is database queries (90% of time)
# Optimize database queries instead!

Micro-Optimizations

# ❌ WRONG: Micro-optimizing at expense of readability
def calculate(x, y):
    # "Using bit shift instead of multiply by 2 for speed!"
    return (x << 1) + (y << 1)
# Saved: ~0.0000001 seconds per call
# Cost: Unreadable code

# ✅ CORRECT: Clear code first
def calculate(x, y):
    return 2 * x + 2 * y
# Modern Python JIT optimizes this anyway
# Only optimize if profiler shows this is bottleneck

Not Benchmarking Changes

# ❌ WRONG: Assuming optimization worked
def slow_function():
    # Original code
    pass

def optimized_function():
    # "Optimized" code
    pass

# Assume optimized_function is faster without measuring

# ✅ CORRECT: Benchmark before and after
import timeit

before = timeit.timeit(slow_function, number=1000)
after = timeit.timeit(optimized_function, number=1000)

print(f"Before: {before:.4f}s")
print(f"After: {after:.4f}s")
print(f"Speedup: {before/after:.2f}x")

# Verify correctness
assert slow_function() == optimized_function()

Decision Trees

What Tool to Use for Profiling?

What do I need to profile?
├─ CPU time
│   ├─ Function-level → cProfile
│   ├─ Line-level → line_profiler
│   └─ Async code → yappi
├─ Memory usage
│   ├─ Line-level → memory_profiler
│   ├─ Allocation tracking → tracemalloc
│   └─ Object types → objgraph
└─ Running process (no code changes) → py-spy

Optimization Strategy

Is code slow?
├─ Yes → Profile to find bottleneck
│   ├─ CPU bound → Profile with cProfile
│   │   └─ Optimize hot functions (vectorize, cache, algorithms)
│   └─ I/O bound → Profile with timing
│       └─ Use async/await, caching, batching
└─ No → Don't optimize (focus on features/correctness)

Memory Issue Diagnosis

Is memory usage high?
├─ Yes → Profile with memory_profiler
│   ├─ Growing over time → Memory leak
│   │   └─ Use tracemalloc to find leak
│   └─ High but stable → Large data structures
│       └─ Optimize data structures (generators, efficient types)
└─ No → Monitor but don't optimize yet

Integration with Other Skills

After using this skill:

  • If I/O bound → See @async-patterns-and-concurrency for async optimization
  • If data processing slow → See @scientific-computing-foundations for vectorization
  • If need to track improvements → See @ml-engineering-workflows for metrics

Before using this skill:

  • If unsure code is slow → Use this skill to profile and confirm!
  • If setting up profiling → See @project-structure-and-tooling for dependencies

Quick Reference

Essential Profiling Commands

# CPU profiling
import cProfile
cProfile.run('main()', 'profile.prof')

# View results
import pstats
stats = pstats.Stats('profile.prof')
stats.sort_stats('cumulative')
stats.print_stats(20)

# Memory profiling
import tracemalloc
tracemalloc.start()
# ... code ...
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
for stat in top_stats[:10]:
    print(stat)

Debugging Commands

# Set breakpoint
breakpoint()  # Python 3.7+
# or
import pdb; pdb.set_trace()

# pdb commands:
# n - next line
# s - step into
# c - continue
# p var - print variable
# l - list code
# w - where am I
# q - quit

Optimization Checklist

  • Profile before optimizing (use cProfile)
  • Identify bottleneck (top 20% of time)
  • Create benchmark for bottleneck
  • Optimize bottleneck
  • Benchmark again to verify improvement
  • Re-profile entire program
  • Verify correctness (tests still pass)

Common Optimizations

Problem Solution Speedup
String concatenation in loop Use str.join() 10-100x
List membership checks Use set 100-1000x
DataFrame iteration Vectorize with NumPy/pandas 100-1000x
Repeated expensive computation Cache with @lru_cache ∞ (depends on cache hits)
I/O bound Use async/await 10-100x
CPU bound with parallelizable work Use multiprocessing ~number of cores

Red Flags

If you find yourself:

  • Optimizing before profiling → STOP, profile first
  • Spending hours on micro-optimizations → Check if it's bottleneck
  • Making code unreadable for speed → Benchmark the benefit
  • Assuming what's slow → Profile to verify

Always measure. Never assume.