kernel-triton-writing
Installation
SKILL.md
Triton Kernel Writing
Principles
Correctness First
- Never benchmark before verification passes.
- Always mask loads and stores for non-divisible shapes.
- Include
kernel_fn,reference_fn, andget_inputs()exports for companion scripts. - Always run
scripts/verify_kernel.pyto validate against the reference.
FP16/BF16 Precision Rules (LOW FREEDOM -- follow exactly)
Transcendental functions (tl.exp, tl.log, tl.math.erf, tl.math.tanh) require fp32 inputs.
# WRONG -- compilation error or wrong results with fp16/bf16:
result = tl.exp(x)