perf-torch-cuda-graphs
Installation
SKILL.md
CUDA Graphs for PyTorch
CUDA Graphs capture a sequence of GPU operations once and replay them with minimal CPU overhead. This skill guides applying CUDA Graphs to PyTorch training and inference workloads using native PyTorch APIs, Transformer Engine, and Megatron-LM.
When to Use
Reach for this skill when you encounter: