flashkda-delta-attention

Installation
SKILL.md

FlashKDA Delta Attention Skill

Skill by ara.so — Daily 2026 Skills collection.

FlashKDA provides high-performance CUDA kernels for Kimi Delta Attention (KDA) built on CUTLASS. It targets SM90+ GPUs (H100/H20 class) and integrates as a drop-in backend for flash-linear-attention's chunk_kda operation.

Requirements

  • GPU: SM90+ (H100, H20, or newer)
  • CUDA 12.9+
  • PyTorch 2.4+
  • Python 3.8+

Installation

git clone https://github.com/MoonshotAI/FlashKDA.git flash-kda
cd flash-kda
git submodule update --init --recursive
Related skills
Installs
196
GitHub Stars
4
First Seen
Apr 23, 2026