rocm-kernels
Installation
SKILL.md
ROCm Triton Kernels for Diffusers & Transformers
This skill provides patterns and guidance for developing optimized Triton kernels targeting AMD GPUs (MI355X, R9700) on ROCm, for use with HuggingFace diffusers (LTX-Video, SD3, FLUX) and transformers libraries.
Quick Start
Diffusers (LTX-Video)
Inject optimized kernels into LTX-Video pipeline:
import os
os.environ['TRITON_HIP_USE_BLOCK_PINGPONG'] = '1'
os.environ['TRITON_HIP_USE_ASYNC_COPY'] = '1'