name	arcanea-performance-tuning
description	Master the art of making systems fast. Profiling, optimization, caching, and the wisdom to know when performance matters and when it doesn't. Measure twice, optimize once.
version	2.0.0
author	Arcanea
tags	performance, optimization, profiling, speed, tuning, development
triggers	performance, optimization, slow, speed up, profiling, bottleneck

The Performance Tuning Codex

"Premature optimization is the root of all evil. But mature optimization is the root of all delight."

The Performance Philosophy

The Golden Rules

RULE 1: MEASURE FIRST
Don't guess where the bottleneck is.
Profile. Measure. Prove.

RULE 2: OPTIMIZE THE RIGHT THING
80% of time is spent in 20% of code.
Find that 20%.

RULE 3: SET TARGETS
"Faster" is not a goal.
"Under 200ms" is a goal.

RULE 4: REGRESSION PREVENTION
Performance is easy to lose.
Benchmark continuously.

The Optimization Hierarchy

╔═══════════════════════════════════════════════════════════════════╗
║                    OPTIMIZATION HIERARCHY                          ║
║              (Optimize in this order)                              ║
╠═══════════════════════════════════════════════════════════════════╣
║                                                                    ║
║   1. ALGORITHM         │ O(n²) → O(n log n) = massive wins       ║
║   2. DATA STRUCTURE    │ Right structure for access pattern       ║
║   3. I/O               │ Network, disk, database calls            ║
║   4. MEMORY            │ Allocation, garbage collection           ║
║   5. CPU               │ Hot loops, cache efficiency              ║
║                                                                    ║
║   (Don't optimize #5 if #1-4 are the problem)                     ║
║                                                                    ║
╚═══════════════════════════════════════════════════════════════════╝

Profiling

Types of Profiling

CPU PROFILING:
• What functions take the most time?
• Where are the hot paths?
• What's the call graph?

MEMORY PROFILING:
• Where is memory allocated?
• What's causing garbage collection?
• Are there memory leaks?

I/O PROFILING:
• What queries are slow?
• What network calls are made?
• What files are accessed?

TRACE PROFILING:
• What's the full request lifecycle?
• Where do requests spend time?
• What's the concurrency pattern?

The Profiling Process

┌─────────────────────────────────────────────────────────────────┐
│                    THE PROFILING CYCLE                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   1. ESTABLISH BASELINE                                          │
│      Measure current performance                                 │
│      Record metrics: latency, throughput, resource usage         │
│                                                                  │
│   2. SET TARGET                                                   │
│      Define acceptable performance                               │
│      "P95 latency < 200ms"                                       │
│                                                                  │
│   3. PROFILE                                                      │
│      Identify bottlenecks                                        │
│      Focus on top 3 issues                                       │
│                                                                  │
│   4. HYPOTHESIZE                                                  │
│      Why is this slow?                                           │
│      What would make it faster?                                  │
│                                                                  │
│   5. OPTIMIZE                                                     │
│      Make ONE change                                             │
│      Keep it isolated                                            │
│                                                                  │
│   6. MEASURE                                                      │
│      Did it help?                                                │
│      Did it hurt anything else?                                  │
│                                                                  │
│   7. REPEAT                                                       │
│      Until target reached                                        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Common Performance Patterns

The N+1 Query Problem

BAD: N+1 queries
┌──────────────────────────────────────────────────────────────┐
│ // Get all users (1 query)                                   │
│ users = db.query("SELECT * FROM users")                      │
│                                                              │
│ // For each user, get their orders (N queries)               │
│ for user in users:                                           │
│     orders = db.query("SELECT * FROM orders WHERE user_id=?")│
└──────────────────────────────────────────────────────────────┘

GOOD: Eager loading
┌──────────────────────────────────────────────────────────────┐
│ // Single query with JOIN                                    │
│ SELECT users.*, orders.*                                     │
│ FROM users                                                   │
│ LEFT JOIN orders ON orders.user_id = users.id                │
│                                                              │
│ // Or batch loading                                          │
│ SELECT * FROM orders WHERE user_id IN (1, 2, 3, 4, 5)       │
└──────────────────────────────────────────────────────────────┘

Caching Strategies

╔═══════════════════════════════════════════════════════════════════╗
║                    CACHING STRATEGIES                              ║
╠═══════════════════════════════════════════════════════════════════╣
║                                                                    ║
║   CACHE-ASIDE (Lazy Loading)                                       ║
║   ┌─────────┐                                                      ║
║   │ Request │──┬──▶ Cache Hit ──▶ Return                          ║
║   └─────────┘  │                                                   ║
║                └──▶ Cache Miss ──▶ DB ──▶ Cache ──▶ Return        ║
║                                                                    ║
║   WRITE-THROUGH                                                    ║
║   ┌─────────┐                                                      ║
║   │  Write  │──▶ Cache ──▶ DB ──▶ Confirm                         ║
║   └─────────┘                                                      ║
║                                                                    ║
║   WRITE-BEHIND (Async)                                             ║
║   ┌─────────┐                                                      ║
║   │  Write  │──▶ Cache ──▶ Confirm                                ║
║   └─────────┘      │                                               ║
║                    └──▶ [Later] ──▶ DB                             ║
║                                                                    ║
╚═══════════════════════════════════════════════════════════════════╝

CACHE INVALIDATION:
• TTL (Time To Live) - Simple but may serve stale data
• Event-based - Invalidate on writes
• Tag-based - Group related items

Connection Pooling

WITHOUT POOLING:
┌──────────┐     ┌──────────┐
│ Request  │──▶──│ Connect  │──▶ 50-100ms overhead
└──────────┘     └──────────┘

WITH POOLING:
┌──────────┐     ┌──────────────┐     ┌──────────┐
│ Request  │──▶──│ Pool Manager │──▶──│ Reuse    │──▶ ~0ms
└──────────┘     └──────────────┘     └──────────┘

POOL CONFIGURATION:
• Min connections: Keep warm for base load
• Max connections: Limit to prevent exhaustion
• Idle timeout: Release unused connections
• Connection lifetime: Prevent stale connections

Lazy Loading

EAGER (Load everything):
┌────────────────────────────────────────────────────────┐
│ class User:                                            │
│     def __init__(self, id):                            │
│         self.profile = load_profile(id)   # Always     │
│         self.orders = load_orders(id)     # Always     │
│         self.preferences = load_prefs(id) # Always     │
└────────────────────────────────────────────────────────┘

LAZY (Load on demand):
┌────────────────────────────────────────────────────────┐
│ class User:                                            │
│     def __init__(self, id):                            │
│         self._id = id                                  │
│         self._orders = None                            │
│                                                        │
│     @property                                          │
│     def orders(self):                                  │
│         if self._orders is None:                       │
│             self._orders = load_orders(self._id)       │
│         return self._orders                            │
└────────────────────────────────────────────────────────┘

Database Optimization

Index Optimization

WHEN TO INDEX:
✓ Columns in WHERE clauses
✓ Columns in JOIN conditions
✓ Columns in ORDER BY
✓ Columns with high selectivity

WHEN NOT TO INDEX:
✗ Small tables (full scan is faster)
✗ Columns with low selectivity (gender, boolean)
✗ Tables with heavy writes (index maintenance cost)
✗ Columns rarely queried

COMPOSITE INDEX ORDER:
• Equality conditions first
• Range conditions last
• Most selective first

INDEX (status, created_at)  -- status = 'active' AND created_at > ?

Query Optimization

EXPLAIN ANALYZE:
Always explain before optimizing.

┌────────────────────────────────────────────────────────────────┐
│ EXPLAIN ANALYZE                                                 │
│ SELECT * FROM orders                                            │
│ WHERE user_id = 123 AND status = 'pending'                      │
│ ORDER BY created_at DESC                                        │
│ LIMIT 10;                                                       │
│                                                                 │
│ Look for:                                                       │
│ • Seq Scan (bad on large tables)                               │
│ • Index Scan (good)                                            │
│ • Sort (expensive if not indexed)                              │
│ • Rows vs estimated rows (accuracy of stats)                   │
└────────────────────────────────────────────────────────────────┘

COMMON FIXES:
• Add missing indexes
• Rewrite subqueries as JOINs
• Use LIMIT for pagination
• Avoid SELECT * in production
• Partition large tables

Frontend Performance

Critical Rendering Path

┌─────────┐     ┌─────────┐     ┌─────────┐     ┌─────────┐
│  HTML   │──▶──│  CSS    │──▶──│   JS    │──▶──│ Render  │
│  Parse  │     │  Parse  │     │ Execute │     │  Paint  │
└─────────┘     └─────────┘     └─────────┘     └─────────┘
     │               │               │
     ▼               ▼               ▼
    DOM            CSSOM          Execute
   Build           Build         & Modify

OPTIMIZATION:
1. Minimize critical resources
2. Minimize critical bytes
3. Minimize critical path length

Core Web Vitals

LCP (Largest Contentful Paint):
Target: < 2.5s
• Optimize images
• Preload critical resources
• Use CDN

FID (First Input Delay):
Target: < 100ms
• Break up long tasks
• Defer non-critical JS
• Use web workers

CLS (Cumulative Layout Shift):
Target: < 0.1
• Set image dimensions
• Reserve space for ads
• Avoid inserting content above fold

Bundle Optimization

CODE SPLITTING:
// Instead of one large bundle
import { everything } from 'huge-library';

// Load on demand
const HeavyComponent = lazy(() => import('./HeavyComponent'));

TREE SHAKING:
// Bad: imports everything
import _ from 'lodash';

// Good: imports only what's used
import { debounce } from 'lodash-es';

COMPRESSION:
• Gzip: 70-90% reduction
• Brotli: 15-20% better than Gzip
• Enable on server and CDN

Concurrency & Parallelism

Async Patterns

SEQUENTIAL (Slow):
┌────────────────────────────────────────────────────────────────┐
│ result1 = await fetchUser()      // 100ms                      │
│ result2 = await fetchOrders()    // 150ms                      │
│ result3 = await fetchProducts()  // 120ms                      │
│ // Total: 370ms                                                │
└────────────────────────────────────────────────────────────────┘

PARALLEL (Fast):
┌────────────────────────────────────────────────────────────────┐
│ [user, orders, products] = await Promise.all([                 │
│     fetchUser(),                                               │
│     fetchOrders(),                                             │
│     fetchProducts()                                            │
│ ])                                                             │
│ // Total: 150ms (slowest call)                                │
└────────────────────────────────────────────────────────────────┘

Rate Limiting & Backpressure

RATE LIMITING:
┌────────────────────────────────────────────────────────────────┐
│ Token Bucket Algorithm:                                         │
│                                                                 │
│ • Bucket has capacity (e.g., 100 tokens)                       │
│ • Tokens added at fixed rate (e.g., 10/second)                 │
│ • Each request consumes a token                                │
│ • No tokens = request rejected                                 │
└────────────────────────────────────────────────────────────────┘

BACKPRESSURE:
┌────────────────────────────────────────────────────────────────┐
│ When producer is faster than consumer:                          │
│                                                                 │
│ Options:                                                        │
│ • Drop: Discard excess (lossy)                                 │
│ • Buffer: Queue until processed (memory risk)                  │
│ • Sample: Process every Nth item                               │
│ • Slow down: Signal producer to wait                           │
└────────────────────────────────────────────────────────────────┘

Monitoring & Metrics

Key Metrics

THE FOUR GOLDEN SIGNALS:
┌─────────────────────────────────────────────────────────────┐
│ 1. LATENCY    │ Time to serve a request                    │
│ 2. TRAFFIC    │ Requests per second                        │
│ 3. ERRORS     │ Rate of failed requests                    │
│ 4. SATURATION │ How "full" the service is                  │
└─────────────────────────────────────────────────────────────┘

PERCENTILES:
• P50 (median): Typical experience
• P95: Most users' worst experience
• P99: Tail latency (important!)
• Max: Absolute worst case

Note: Average is misleading.
      A few slow requests hide in the average.

Benchmarking

MICRO-BENCHMARKS:
• Test specific functions
• Isolate from I/O
• Run many iterations
• Beware of JIT warmup

LOAD TESTING:
• Simulate realistic traffic
• Measure at various loads
• Find the breaking point
• Test failure scenarios

TOOLS:
• k6, Artillery, Locust (load testing)
• wrk, hey (HTTP benchmarking)
• hyperfine (CLI benchmarking)

Quick Reference

Performance Checklist

□ Profiled to find actual bottlenecks
□ Set measurable performance targets
□ Optimized hot paths first
□ Added appropriate caching
□ Minimized I/O operations
□ Used connection pooling
□ Indexed frequently queried columns
□ Implemented lazy loading where appropriate
□ Set up performance monitoring
□ Established performance regression tests

Common Performance Wins

| Problem              | Solution                    |
|----------------------|-----------------------------|
| N+1 queries          | Eager loading, batch        |
| Slow queries         | Add indexes, optimize SQL   |
| Large payloads       | Pagination, compression     |
| Repeated computation | Caching, memoization        |
| Synchronous waits    | Async, parallel execution   |
| Cold starts          | Warmup, connection pools    |
| Large bundles        | Code splitting, tree shake  |
| Slow images          | Lazy load, WebP, CDN        |

The Performance Mantras

"Measure first, optimize second"
"The fastest code is code that doesn't run"
"Cache invalidation is hard; TTL is your friend"
"Profile in production, not just development"
"Optimize for the common case"

"Performance is not about making things fast. It's about removing what makes things slow."

arcanea-performance-tuning

Install Skill

SKILL.md

The Performance Tuning Codex

The Performance Philosophy

The Golden Rules

The Optimization Hierarchy

Profiling

Types of Profiling

The Profiling Process

Common Performance Patterns

The N+1 Query Problem

Caching Strategies

Connection Pooling

Lazy Loading

Database Optimization

Index Optimization

Query Optimization

Frontend Performance

Critical Rendering Path

Core Web Vitals

Bundle Optimization

Concurrency & Parallelism

Async Patterns

Rate Limiting & Backpressure

Monitoring & Metrics

Key Metrics

Benchmarking

Quick Reference

Performance Checklist

Common Performance Wins

The Performance Mantras