tilekernels-gpu-kernels
TileKernels GPU Kernel Library
Skill by ara.so — Daily 2026 Skills collection.
TileKernels is a high-performance GPU kernel library for LLM operations (MoE routing, FP8/FP4 quantization, transpose, engram gating, Manifold HyperConnection) written in TileLang — a Python DSL for expressing GPU kernels with automatic optimization. Kernels target NVIDIA SM90/SM100 (Hopper/Blackwell) architectures and approach hardware performance limits.
Requirements
- Python 3.10+
- PyTorch 2.10+
- TileLang 0.1.9+
- NVIDIA SM90 or SM100 GPU (H100/H200/B100/B200)
- CUDA Toolkit 13.1+
Installation
# Development install (recommended for extending/modifying kernels)
pip install -e ".[dev]"
More from aradotso/trending-skills
openclaw-control-center
Local-first, security-first control center for OpenClaw agents — visibility dashboard with readonly defaults, token attribution, collaboration tracing, and safe write operations.
3.9Kinkos-multi-agent-novel-writing
Multi-agent CLI system for autonomous novel writing, auditing, and revision with human review gates
1.8Keverything-claude-code-harness
Agent harness performance system for Claude Code and other AI coding agents — skills, instincts, memory, hooks, commands, and security scanning
1.6Kagency-agents-ai-specialists
A collection of specialized AI agent personalities for Claude Code, Cursor, Aider, Windsurf, and other AI coding tools — covering engineering, design, marketing, sales, and more.
1.6Kui-ux-pro-max-skill
AI design intelligence skill for building professional UI/UX across multiple platforms with 161 reasoning rules, 67 styles, and automated design system generation
1.5Kunderstand-anything-knowledge-graph
Turn any codebase into an interactive knowledge graph using Claude Code skills — explore, search, and ask questions about any project visually.
1.5K