helion-jagged-and-autotuning
Installation
SKILL.md
Helion: Jagged Tensors & Autotuning/Config Management
Overview
Two high-value Helion areas that are easy to get wrong or miss entirely:
- Ragged/jagged tensors —
hl.jagged_tile()iterates variable-length inner dimensions with implicit masking, so you never hand-build masks. - Autotuning & config management — autotuning is slow; Helion has a layered system (on-disk cache → saved configs → AOT heuristics) for tuning once and reusing results keyed by GPU architecture and input shape.
The single most-missed feature: AOT heuristics (helion.experimental.aot_kernel
python -m helion.experimental.aot_runner) give zero-cost per-shape config selection at runtime, with automatic compute-capability fallback. If a request mentions "many GPUs," "many shapes," or "don't want to re-tune at deploy," reach for AOT, not just the cache.