name: torch-compile description: Optimize PyTorch with torch.compile (TorchDynamo/Inductor), focusing on compile overhead, graph breaks, and benchmark methodology. Use when speeding up PyTorch models or debugging compile behavior; triggers: torch.compile, torchdynamo, inductor, graph break, pytorch optimization.
Torch Compile
Overview
Use torch.compile to JIT-compile PyTorch code into optimized kernels, then validate speedups with warmups and graph-break audits.
When to Use
Use this skill only when the frontmatter triggers apply; otherwise keep eager mode.
Decision Tree
- Do you need to reduce Python overhead in hot paths?
- Yes: compile and benchmark.
- Are first runs much slower than eager?
- Yes: warm up and re-measure after caching.
- Are graph breaks frequent?
- Yes: audit with
torch._dynamo.explainor logging and reduce non-tensor logic.
- Yes: audit with
Workflows
1. Compile Benchmark With Warmup
- Run a short eager baseline.
- Compile the model and run warmup iterations.
- Measure steady-state latency after warmup.
- Compare the eager and compiled timings.
2. Graph Break Audit
- Run
torch._dynamo.explainon the target function. - Record graph break counts and reasons.
- Move non-tensor logic outside the compiled region.
- Re-run the explain pass to confirm fewer breaks.
3. Speedup Expectation Check
- Confirm the workload is Python-overhead bound.
- If the workload is GPU compute bound, expect lower gains.
- Adjust batch size or fuse operations to increase gains.
Non-Obvious Insights
- Compilation overhead shows up on the first few executions, so warmup is required before benchmarking.
- Speedup depends on reducing Python overhead and GPU read/writes; architecture and batch size affect the outcome.
- Graph breaks trade optimization opportunities for correctness rather than crashing.
Evidence
- "torch.compile makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels, while requiring minimal code changes." - PyTorch
- "torch.compile takes extra time to compile the model on the first few executions." - PyTorch
- "reducing Python overhead and GPU read/writes, and so the observed speedup may vary on factors such as model architecture and batch size." - PyTorch
- "Graph breaks result in lost optimization opportunities, which may still be undesirable, but this is better than silent incorrectness or a hard crash." - PyTorch
Scripts
scripts/torch-compile_tool.py: CLI for probing torch.compile availability, benchmarking, and explain output.scripts/torch-compile_tool.js: Node.js wrapper for the same CLI.
Dependencies
- Python 3.11+ or Node 18+.
- PyTorch 2.0+ for torch.compile.