cute-dsl-ref

Installation
SKILL.md

CuTe Python DSL Reference

Execution Model

CuTe Python DSL is the Python surface for NVIDIA's CuTe layout algebra. Unlike cuTile's block-level abstraction, CuTe DSL exposes explicit thread/warp/warpgroup control, TMA pipelines, barrier choreography, and shared memory management.

Two-Level Host/Device Pattern

Every CuTe DSL kernel has two functions:

  1. @cute.jit host function — runs on CPU, sets up TMA descriptors, computes grid, allocates shared memory, launches the kernel
  2. @cute.kernel device function — runs on GPU, contains the actual computation
import cutlass
import cutlass.cute as cute
Related skills
Installs
35
First Seen
Apr 21, 2026