CuTeDSL Kernel Integration

Use this skill to add or update a CuTeDSL frontend-only API in cuDNN Frontend. The goal is a complete integration: Python API, wrapper, exports, docs, and tests.

Before Editing

Inspect the current repo state and avoid overwriting unrelated changes.
Confirm every original source file needed for the integration is available. If a source file is missing, report that gap instead of inferring its contract from a related kernel.
Record source provenance when it is available: upstream URL, local source path, commit, and which files map to public API modules versus private helpers.
Classify the kernel before choosing a template:
- Kernel family: dense GEMM, GEMM fusion, grouped GEMM, discrete grouped GEMM, MoE, attention, sparse attention, or another frontend-only API family.
- Execution topology: single kernel, paired forward/backward APIs, multi-kernel orchestrator, helper-kernel setup, distributed/runtime-coordinated execution, or internal scheduler.
- Public surface: class API, high-level wrapper, returned tensors, optional outputs, workspace ownership, and import/export namespace.
- Internal support: source helper modules, schedulers, metadata utilities, and generated descriptors that must stay private to the package.
Read references/integration-pattern.md for the detailed repo conventions before implementing.

cutedsl-kernel-integration

CuTeDSL Kernel Integration

Before Editing

Integration Workflow