kernel-triton-writing

Installation
SKILL.md

Triton Kernel Writing

Principles

Correctness First

  1. Never benchmark before verification passes.
  2. Always mask loads and stores for non-divisible shapes.
  3. Include kernel_fn, reference_fn, and get_inputs() exports for companion scripts.
  4. Always run scripts/verify_kernel.py to validate against the reference.

FP16/BF16 Precision Rules (LOW FREEDOM -- follow exactly)

Transcendental functions (tl.exp, tl.log, tl.math.erf, tl.math.tanh) require fp32 inputs.

# WRONG -- compilation error or wrong results with fp16/bf16:
result = tl.exp(x)
Installs
3
GitHub Stars
13.9K
First Seen
May 8, 2026
kernel-triton-writing — nvidia/tensorrt-llm