cpu-kernels

Installation
SKILL.md

CPU C++ Kernels for x86 Processors

This skill provides patterns and guidance for developing optimized C++ kernels targeting x86 CPUs (Intel Xeon and compatible processors) with AVX2 and AVX512 intrinsics. Kernels are compiled via kernel-builder and distributed through the Hugging Face kernels ecosystem.

Who runs these commands? You, the agent — not a human. This is an autonomous loop: you write/edit the C++ kernel, build it, then run the scripts below as tools (via Bash) to check correctness, benchmark, and profile. You read each result, record it with trial_manager.py, decide the next change from the Phase 2 decision tree, and repeat until you hit early_stop_speedup or run all max_trials.

Key Concepts (read before the Quick Start)

The commands use a few names that mean different things. They are not interchangeable:

Name (example) What it is Used by
baseline.py The PyTorch reference implementation you optimize against. It is the ground truth for correctness and the speed reference for speedup. It must define get_inputs() and either get_reference_output() or a Model class (plus optional get_init_inputs()). You write this file (or it is given) before starting. every script
my_rmsnorm A trial-tree label — an arbitrary name you pick for this optimization task. trial_manager.py stores all attempts under trials/my_rmsnorm/. It is only a tracking ID. trial_manager.py only
my_kernel The installed Python package name — the build artifact produced by kernel-builder build + pip install. This is the importable module that contains your compiled kernel. --kernel-package
my_kernel.rms_norm An <package>.<function> path — the actual callable inside the installed package. Passed to --op to tell the benchmark/profiler which function to run. --op

⚠️ --op means two different things depending on the script. In analyze_op.py, --op is a plain operation name (e.g. "rms_norm") used to look up compute/memory characteristics. In benchmark_cpu.py and cpu_profiler.py, --op is a package.function path (e.g. my_kernel.rms_norm) used to import and call your kernel. Same flag, different meaning — read each command below carefully.

Installs
4
GitHub Stars
702
First Seen
6 days ago
cpu-kernels — huggingface/kernels