name: torch-compile description: Optimize PyTorch with torch.compile (TorchDynamo/Inductor), focusing on compile overhead, graph breaks, and benchmark methodology. Use when speeding up PyTorch models or debugging compile behavior; triggers: torch.compile, torchdynamo, inductor, graph break, pytorch optimization.

Torch Compile

Use torch.compile to JIT-compile PyTorch code into optimized kernels, then validate speedups with warmups and graph-break audits.

Use this skill only when the frontmatter triggers apply; otherwise keep eager mode.

Do you need to reduce Python overhead in hot paths?
- Yes: compile and benchmark.
Are first runs much slower than eager?
- Yes: warm up and re-measure after caching.
Are graph breaks frequent?
- Yes: audit with torch._dynamo.explain or logging and reduce non-tensor logic.

Compilation overhead shows up on the first few executions, so warmup is required before benchmarking.
Speedup depends on reducing Python overhead and GPU read/writes; architecture and batch size affect the outcome.
Graph breaks trade optimization opportunities for correctness rather than crashing.

"torch.compile makes PyTorch code run faster by JIT-compiling PyTorch code into optimized kernels, while requiring minimal code changes." - PyTorch
"torch.compile takes extra time to compile the model on the first few executions." - PyTorch
"reducing Python overhead and GPU read/writes, and so the observed speedup may vary on factors such as model architecture and batch size." - PyTorch
"Graph breaks result in lost optimization opportunities, which may still be undesirable, but this is better than silent incorrectness or a hard crash." - PyTorch

scripts/torch-compile_tool.py: CLI for probing torch.compile availability, benchmarking, and explain output.
scripts/torch-compile_tool.js: Node.js wrapper for the same CLI.