cuda-kernels

Installation
SKILL.md

CUDA Kernels for Diffusers & Transformers

This skill provides patterns and guidance for developing optimized CUDA kernels targeting NVIDIA GPUs (H100, A100, T4) for use with HuggingFace diffusers and transformers libraries.

Hard Constraints — Read Before Writing Any Code

Kernels MUST build with kernel-builder and meet the Kernel Hub requirements. kernel-builder compiles against the Python limited API (ABI3) so a single binary works for Python 3.9+ across versions. Several patterns that are standard in generic PyTorch-extension tutorials are therefore hard build failures here. Do not use them, even if PyTorch documentation or your training data suggests them.

Disallowed patterns — never generate these

Installs
149
GitHub Stars
702
First Seen
Feb 14, 2026
cuda-kernels — huggingface/kernels