tilekernels-gpu-kernels

Installation
SKILL.md

TileKernels GPU Kernel Library

Skill by ara.so — Daily 2026 Skills collection.

TileKernels is a high-performance GPU kernel library for LLM operations (MoE routing, FP8/FP4 quantization, transpose, engram gating, Manifold HyperConnection) written in TileLang — a Python DSL for expressing GPU kernels with automatic optimization. Kernels target NVIDIA SM90/SM100 (Hopper/Blackwell) architectures and approach hardware performance limits.

Requirements

  • Python 3.10+
  • PyTorch 2.10+
  • TileLang 0.1.9+
  • NVIDIA SM90 or SM100 GPU (H100/H200/B100/B200)
  • CUDA Toolkit 13.1+

Installation

# Development install (recommended for extending/modifying kernels)
pip install -e ".[dev]"
Related skills
Installs
191
GitHub Stars
4
First Seen
Apr 23, 2026