perf-torch-sync-free
Installation
SKILL.md
Writing Sync-Free PyTorch Code
Sync-free code means the CPU continuously queues work to the GPU without waiting for GPU operations to complete. When host-device synchronizations are eliminated, the GPU works continuously without idle stalls.
Every host-device synchronization ultimately calls one of three CUDA driver APIs that block the CPU thread:
cuEventSynchronize-- CPU waits until a specific GPU event completescuStreamSynchronize-- CPU waits until all work on a stream finishescuCtxSynchronize-- CPU waits until all work across all streams finishes
When to Use
Reach for this skill when you encounter: