add-cuda-kernel
SKILL.md
Tutorial: Adding a New Kernel to FlashInfer
This tutorial walks through adding a simple element-wise scale operation to FlashInfer. We'll implement scale(x, factor) = x * factor to demonstrate the complete workflow.
Goal
Add a new operation that scales each element of a tensor by a scalar factor:
- Input: tensor
xand scalarfactor - Output:
x * factor(element-wise) - Support multiple dtypes (FP16, BF16, FP32)
Step 1: Define CUDA Kernel in include/
Create include/flashinfer/scale.cuh: