skills/smithery.ai/add-cuda-kernel

add-cuda-kernel

SKILL.md

Tutorial: Adding a New Kernel to FlashInfer

This tutorial walks through adding a simple element-wise scale operation to FlashInfer. We'll implement scale(x, factor) = x * factor to demonstrate the complete workflow.

Goal

Add a new operation that scales each element of a tensor by a scalar factor:

  • Input: tensor x and scalar factor
  • Output: x * factor (element-wise)
  • Support multiple dtypes (FP16, BF16, FP32)

Step 1: Define CUDA Kernel in include/

Create include/flashinfer/scale.cuh:

Installs
2
First Seen
Apr 16, 2026