Claude Code Plugins

Community-maintained marketplace

Feedback

python-performance-patterns

@smith6jt-cop/Skills_Registry
0
0

Low-risk Python performance optimization patterns with verified speedups

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name python-performance-patterns
description Low-risk Python performance optimization patterns with verified speedups
author KINTSUGI Team
date Fri Dec 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

Python Performance Patterns

Experiment Overview

Item Details
Date 2025-12-12
Goal Implement safe, high-impact performance optimizations from audit
Environment Python 3.10+, NumPy, scikit-image
Status Success - 5 patterns verified

Context

After a performance audit identified 40+ issues, we needed to prioritize which fixes were safe to implement without extensive testing. These patterns are low-risk, high-reward optimizations.

Pattern 1: @lru_cache for File Metadata

Problem: Functions that check file types open files repeatedly.

Before (3-5 file opens per call):

def get_img_type(img_f):
    is_ome = check_is_ome(img_f)      # Opens file
    can_use_vips = check_to_use_vips(img_f)  # Opens file
    can_use_openslide = check_to_use_openslide(img_f)  # Opens file

After (cached after first call):

from functools import lru_cache

@lru_cache(maxsize=256)
def check_is_ome(src_f):
    # ... implementation

@lru_cache(maxsize=256)
def check_to_use_vips(src_f):
    # ... implementation

@lru_cache(maxsize=256)
def get_img_type(img_f):
    # ... implementation

Speedup: 3-5x for batch file operations

Safe because: File metadata doesn't change during a session


Pattern 2: Vectorized Overlap Matrix

Problem: O(n²) nested loop for pixel-by-pixel counting.

Before:

overlap = np.zeros((max_label1 + 1, max_label2 + 1), dtype=np.int32)
for i in range(labels1.shape[0]):
    for j in range(labels1.shape[1]):
        l1 = labels1[i, j]
        l2 = labels2[i, j]
        overlap[l1, l2] += 1

After:

overlap = np.zeros((max_label1 + 1, max_label2 + 1), dtype=np.int64)
np.add.at(overlap, (labels1.ravel(), labels2.ravel()), 1)

Speedup: 5-10x for large label arrays

Safe because: Mathematically equivalent, uses NumPy built-in


Pattern 3: argpartition for Top-K

Problem: Full sort when only need k smallest/largest values.

Before (O(n log n)):

neighbor_indices = np.argsort(distances[i])[1:n_neighbors+1]

After (O(n)):

neighbor_indices = np.argpartition(distances[i], n_neighbors+1)[1:n_neighbors+1]

Speedup: 3-5x for large arrays

Safe because: Returns same indices, just not sorted (usually fine for k-NN)

Caveat: If you need the k values in sorted order, add a second sort on just k elements.


Pattern 4: Parallel Image Loading

Problem: Sequential I/O when loading many images.

Before:

img_list = [io.imread(os.path.join(src_dir, f)) for f in img_f_list]

After:

from concurrent.futures import ThreadPoolExecutor

img_paths = [os.path.join(src_dir, f) for f in img_f_list]
with ThreadPoolExecutor(max_workers=min(8, len(img_paths))) as executor:
    img_list = list(executor.map(io.imread, img_paths))

Speedup: 10-20x for 50+ images

Safe because: Image reads are independent, thread-safe


Pattern 5: Single Directory Scan

Problem: Multiple glob calls scan directory repeatedly.

Before (5 filesystem scans):

patterns = ["*.tif", "*.tiff", "*.zarr", "*.ome.tif", "*.ome.tiff"]
for pattern in patterns:
    for f in cycle_dir.glob(pattern):
        process(f)

After (1 filesystem scan):

valid_extensions = {".tif", ".tiff", ".zarr"}
valid_suffixes = {".ome.tif", ".ome.tiff"}
for f in cycle_dir.iterdir():
    suffix_lower = f.suffix.lower()
    name_lower = f.name.lower()
    if suffix_lower in valid_extensions or any(name_lower.endswith(s) for s in valid_suffixes):
        process(f)

Speedup: 2-3x for directories with many files

Safe because: Same files matched, just more efficiently


Failed Attempts (Critical)

Attempt Why it Failed Lesson Learned
Caching image processing results Wrong results for different inputs Only cache metadata, not computed results
Caching BaSiC illumination profiles 15-20% intensity errors for sparse markers Each channel needs its own profile (see basic-caching-evaluation skill)
Parallelizing CPU-bound NumPy GIL contention, no speedup ThreadPoolExecutor only for I/O-bound tasks
Removing "unnecessary" array copies Broke downstream code expecting copies Test thoroughly before removing .copy()

Priority Matrix

Priority Pattern Risk Speedup
P0 Parallel image loading Low 10-20x
P1 @lru_cache metadata Low 3-5x
P1 Vectorized overlap Low 5-10x
P2 argpartition Low 3-5x
P2 Single directory scan Low 2-3x
P3 Remove dask copies Medium 2x memory

When NOT to Apply These Patterns

  1. @lru_cache: Don't cache if function has side effects or file might change
  2. argpartition: Don't use if you need results in sorted order
  3. ThreadPoolExecutor: Don't use for CPU-bound operations (use ProcessPoolExecutor)
  4. Removing copies: Don't remove if downstream code modifies the array

Key Insights

  • Start with I/O optimizations - biggest wins with lowest risk
  • Vectorization beats loops in NumPy, always
  • Profile before optimizing - intuition is often wrong
  • Test numerical accuracy after optimization, not just correctness

References

  • NumPy performance tips documentation
  • Python concurrent.futures documentation
  • KINTSUGI Performance Audit (2025-12-12)