cutedsl-kernel-integration
Installation
SKILL.md
CuTeDSL Kernel Integration
Use this skill to add or update a CuTeDSL frontend-only API in cuDNN Frontend. The goal is a complete integration: Python API, wrapper, exports, docs, and tests.
Before Editing
- Inspect the current repo state and avoid overwriting unrelated changes.
- Confirm every original source file needed for the integration is available. If a source file is missing, report that gap instead of inferring its contract from a related kernel.
- Record source provenance when it is available: upstream URL, local source path, commit, and which files map to public API modules versus private helpers.
- Classify the kernel before choosing a template:
- Kernel family: dense GEMM, GEMM fusion, grouped GEMM, discrete grouped GEMM, MoE, attention, sparse attention, or another frontend-only API family.
- Execution topology: single kernel, paired forward/backward APIs, multi-kernel orchestrator, helper-kernel setup, distributed/runtime-coordinated execution, or internal scheduler.
- Public surface: class API, high-level wrapper, returned tensors, optional outputs, workspace ownership, and import/export namespace.
- Internal support: source helper modules, schedulers, metadata utilities, and generated descriptors that must stay private to the package.
- Read
references/integration-pattern.mdfor the detailed repo conventions before implementing.