cuda-kernels

Installation
SKILL.md

CUDA Kernel Development & Optimization

Skill for developing and optimizing custom CUDA kernels in the candle framework for qwen3-tts-rs.

Trigger Words

cuda kernel, custom kernel, fused op, write kernel, ptx, kernel launch, CustomOp, nsys, ncu, profiling, roofline, occupancy, register pressure

Candle Custom Op Patterns

CustomOp1 (single input tensor)

use candle_core::{CustomOp1, Layout, Shape, DType, backend::BackendStorage, CudaStorage};

struct MyFusedOp { /* params */ }

impl CustomOp1 for MyFusedOp {
    fn name(&self) -> &'static str { "my_fused_op" }
Installs
2
GitHub Stars
8
First Seen
Mar 21, 2026
cuda-kernels — trevors/dot-claude