npu-torchair-infer
Installation
SKILL.md
NPU TorchAir Inference & Migration (主流程)
Overview
Bring an arbitrary HuggingFace model up on Ascend NPU torchair graph mode,
verify it numerically against CPU/NPU-eager, and measure its speedup. The bundled
scripts are model-agnostic (config-driven inputs, recursive output comparison),
so the same flow applies to the next model with only a changed
--model_name_or_path.
Bundled assets:
scripts/torch_air_infer.py— minimal single-backend inference + latency stats.scripts/benchmark.py— 3-backend (torchair / npu_eager / cpu) accuracy + perf → JSON + Markdown.scripts/run_benchmark.sh— generic driver: prepare model → sanity infer → benchmark.
The one rule you cannot break
torch_npu MUST be imported before torchair, or graph mode does not engage.