| name | when-profiling-performance-use-performance-profiler |
| version | 1.0.0 |
| description | Comprehensive performance profiling, bottleneck detection, and optimization system |
| author | Claude Code |
| category | performance |
| complexity | HIGH |
| tags | performance, profiling, optimization, benchmarking, mece |
| agents | performance-analyzer, performance-benchmarker, coder, optimizer |
| components | subagent, slash-command, mcp-tool |
| dependencies | claude-flow@alpha, perf (Linux), instruments (macOS), clinic.js (Node.js) |
Performance Profiler Skill
Overview
When profiling performance, use performance-profiler to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions.
MECE Breakdown
Mutually Exclusive Components:
- Baseline Phase: Establish current performance metrics
- Detection Phase: Identify bottlenecks and hot paths
- Analysis Phase: Root cause analysis and impact assessment
- Optimization Phase: Generate and prioritize recommendations
- Implementation Phase: Apply optimizations with agent assistance
- Validation Phase: Benchmark improvements and verify gains
Collectively Exhaustive Coverage:
- CPU Profiling: Function execution time, hot paths, call graphs
- Memory Profiling: Heap usage, allocations, leaks, garbage collection
- I/O Profiling: File system, database, network latency
- Network Profiling: Request timing, bandwidth, connection pooling
- Concurrency: Thread utilization, lock contention, async operations
- Algorithm Analysis: Time complexity, space complexity
- Cache Analysis: Hit rates, cache misses, invalidation patterns
- Database: Query performance, N+1 problems, index usage
Features
Core Capabilities:
- Multi-dimensional performance profiling (CPU, memory, I/O, network)
- Automated bottleneck detection with prioritization
- Real-time profiling and historical analysis
- Flame graph generation for visual analysis
- Memory leak detection and heap snapshots
- Database query optimization
- Algorithmic complexity analysis
- A/B comparison of before/after optimizations
- Production-safe profiling with minimal overhead
- Integration with APM tools (New Relic, DataDog, etc.)
Profiling Modes:
- Quick Scan: 30-second lightweight profiling
- Standard: 5-minute comprehensive analysis
- Deep: 30-minute detailed investigation
- Continuous: Long-running production monitoring
- Stress Test: Load-based profiling under high traffic
Usage
Slash Command:
/profile [path] [--mode quick|standard|deep] [--target cpu|memory|io|network|all]
Subagent Invocation:
Task("Performance Profiler", "Profile ./app with deep CPU and memory analysis", "performance-analyzer")
MCP Tool:
mcp__performance-profiler__analyze({
project_path: "./app",
profiling_mode: "standard",
targets: ["cpu", "memory", "io"],
generate_optimizations: true
})
Architecture
Phase 1: Baseline Measurement
- Establish current performance metrics
- Define performance budgets
- Set up monitoring infrastructure
- Capture baseline snapshots
Phase 2: Bottleneck Detection
- CPU profiling (sampling or instrumentation)
- Memory profiling (heap analysis)
- I/O profiling (syscall tracing)
- Network profiling (packet analysis)
- Database profiling (query logs)
Phase 3: Root Cause Analysis
- Correlate metrics across dimensions
- Identify causal relationships
- Calculate performance impact
- Prioritize issues by severity
Phase 4: Optimization Generation
- Algorithmic improvements
- Caching strategies
- Parallelization opportunities
- Database query optimization
- Memory optimization
- Network optimization
Phase 5: Implementation
- Generate optimized code with coder agent
- Apply database optimizations
- Configure caching layers
- Implement parallelization
Phase 6: Validation
- Run benchmark suite
- Compare before/after metrics
- Verify no regressions
- Generate performance report
Output Formats
Performance Report:
{
"project": "my-app",
"profiling_mode": "standard",
"duration_seconds": 300,
"baseline": {
"requests_per_second": 1247,
"avg_response_time_ms": 123,
"p95_response_time_ms": 456,
"p99_response_time_ms": 789,
"cpu_usage_percent": 67,
"memory_usage_mb": 512,
"error_rate_percent": 0.1
},
"bottlenecks": [
{
"type": "cpu",
"severity": "high",
"function": "processData",
"time_percent": 34.5,
"calls": 123456,
"avg_time_ms": 2.3,
"recommendation": "Optimize algorithm complexity from O(n²) to O(n log n)"
}
],
"optimizations": [...],
"estimated_improvement": {
"throughput_increase": "3.2x",
"latency_reduction": "68%",
"memory_reduction": "45%"
}
}
Flame Graph:
Interactive SVG flame graph showing call stack with time proportions
Heap Snapshot:
Memory allocation breakdown with retention paths
Optimization Report:
Prioritized list of actionable improvements with code examples
Examples
Example 1: Quick CPU Profiling
/profile ./my-app --mode quick --target cpu
Example 2: Deep Memory Analysis
/profile ./my-app --mode deep --target memory --detect-leaks
Example 3: Full Stack Optimization
/profile ./my-app --mode standard --target all --optimize --benchmark
Example 4: Database Query Optimization
/profile ./my-app --mode standard --target io --database --explain-queries
Integration with Claude-Flow
Coordination Pattern:
// Step 1: Initialize profiling swarm
mcp__claude-flow__swarm_init({ topology: "star", maxAgents: 5 })
// Step 2: Spawn specialized agents
[Parallel Execution]:
Task("CPU Profiler", "Profile CPU usage and identify hot paths in ./app", "performance-analyzer")
Task("Memory Profiler", "Analyze heap usage and detect memory leaks", "performance-analyzer")
Task("I/O Profiler", "Profile file system and database operations", "performance-analyzer")
Task("Network Profiler", "Analyze network requests and identify slow endpoints", "performance-analyzer")
Task("Optimizer", "Generate optimization recommendations based on profiling data", "optimizer")
// Step 3: Implementation agent applies optimizations
[Sequential Execution]:
Task("Coder", "Implement recommended optimizations from profiling analysis", "coder")
Task("Benchmarker", "Run benchmark suite and validate improvements", "performance-benchmarker")
Configuration
Default Settings:
{
"profiling": {
"sampling_rate_hz": 99,
"stack_depth": 128,
"include_native_code": false,
"track_allocations": true
},
"thresholds": {
"cpu_hot_path_percent": 10,
"memory_leak_growth_mb": 10,
"slow_query_ms": 100,
"slow_request_ms": 1000
},
"optimization": {
"auto_apply": false,
"require_approval": true,
"run_tests_before": true,
"run_benchmarks_after": true
},
"output": {
"flame_graph": true,
"heap_snapshot": true,
"call_tree": true,
"recommendations": true
}
}
Profiling Techniques
CPU Profiling:
- Sampling: Periodic stack sampling (low overhead)
- Instrumentation: Function entry/exit hooks (accurate but higher overhead)
- Tracing: Event-based profiling
Memory Profiling:
- Heap Snapshots: Point-in-time memory state
- Allocation Tracking: Record all allocations
- Leak Detection: Compare snapshots over time
- GC Analysis: Garbage collection patterns
I/O Profiling:
- Syscall Tracing: Track system calls (strace, dtrace)
- File System: Monitor read/write operations
- Database: Query logging and EXPLAIN ANALYZE
- Network: Packet capture and request timing
Concurrency Profiling:
- Thread Analysis: CPU utilization per thread
- Lock Contention: Identify blocking operations
- Async Operations: Promise/callback timing
Performance Optimization Strategies
Algorithmic:
- Reduce time complexity (O(n²) → O(n log n))
- Use appropriate data structures
- Eliminate unnecessary work
- Memoization and dynamic programming
Caching:
- In-memory caching (Redis, Memcached)
- CDN for static assets
- HTTP caching headers
- Query result caching
Parallelization:
- Multi-threading
- Worker pools
- Async I/O
- Batching operations
Database:
- Add missing indexes
- Optimize queries
- Reduce N+1 queries
- Connection pooling
- Read replicas
Memory:
- Object pooling
- Reduce allocations
- Stream processing
- Compression
Network:
- Connection keep-alive
- HTTP/2 or HTTP/3
- Compression
- Request batching
- Rate limiting
Performance Budgets
Frontend:
- Time to First Byte (TTFB): < 200ms
- First Contentful Paint (FCP): < 1.8s
- Largest Contentful Paint (LCP): < 2.5s
- Time to Interactive (TTI): < 3.8s
- Total Blocking Time (TBT): < 200ms
- Cumulative Layout Shift (CLS): < 0.1
Backend:
- API Response Time (p50): < 100ms
- API Response Time (p95): < 500ms
- API Response Time (p99): < 1000ms
- Throughput: > 1000 req/s
- Error Rate: < 0.1%
- CPU Usage: < 70%
- Memory Usage: < 80%
Database:
- Query Time (p50): < 10ms
- Query Time (p95): < 50ms
- Query Time (p99): < 100ms
- Connection Pool Utilization: < 80%
Best Practices
- Profile production workloads when possible
- Use production-like data volumes
- Profile under realistic load
- Measure multiple times for consistency
- Focus on p95/p99, not just averages
- Optimize bottlenecks in order of impact
- Always benchmark before and after
- Monitor for regressions in CI/CD
- Set up continuous profiling
- Track performance over time
Troubleshooting
Issue: High CPU usage but no obvious hot path
Solution: Check for excessive small function calls, increase sampling rate, or use instrumentation
Issue: Memory grows continuously
Solution: Run heap snapshot comparison to identify leak sources
Issue: Slow database queries
Solution: Use EXPLAIN ANALYZE, check for missing indexes, analyze query plans
Issue: High latency but low CPU
Solution: Profile I/O operations, check for blocking synchronous calls
See Also
- PROCESS.md - Detailed step-by-step profiling workflow
- README.md - Quick start guide
- subagent-performance-profiler.md - Agent implementation details
- slash-command-profile.sh - Command-line interface
- mcp-performance-profiler.json - MCP tool schema