cutedsl-kernel-integration

Installation
SKILL.md

CuTeDSL Kernel Integration

Use this skill to add or update a CuTeDSL frontend-only API in cuDNN Frontend. The goal is a complete integration: Python API, wrapper, exports, docs, and tests.

Before Editing

  1. Inspect the current repo state and avoid overwriting unrelated changes.
  2. Confirm every original source file needed for the integration is available. If a source file is missing, report that gap instead of inferring its contract from a related kernel.
  3. Record source provenance when it is available: upstream URL, local source path, commit, and which files map to public API modules versus private helpers.
  4. Classify the kernel before choosing a template:
    • Kernel family: dense GEMM, GEMM fusion, grouped GEMM, discrete grouped GEMM, MoE, attention, sparse attention, or another frontend-only API family.
    • Execution topology: single kernel, paired forward/backward APIs, multi-kernel orchestrator, helper-kernel setup, distributed/runtime-coordinated execution, or internal scheduler.
    • Public surface: class API, high-level wrapper, returned tensors, optional outputs, workspace ownership, and import/export namespace.
    • Internal support: source helper modules, schedulers, metadata utilities, and generated descriptors that must stay private to the package.
  5. Read references/integration-pattern.md for the detailed repo conventions before implementing.

Integration Workflow

Installs
1
GitHub Stars
844
First Seen
Jun 12, 2026
cutedsl-kernel-integration — nvidia/cudnn-frontend