MindSpeed-MM Ascend NPU Base Environment Setup

This skill guides users through setting up the base environment for MindSpeed-MM multimodal training on Huawei Ascend NPU.

Important: This guide only covers the base environment. Different multimodal models (qwen3vl, wan2.2, hunyuanvideo, etc.) have vastly different dependency version requirements that may conflict with each other. Model-specific dependencies must be installed on top of the base environment. After completing this guide, refer to the corresponding model's SKILL for additional configuration.

Component Relationship

Megatron-LM (NVIDIA)     <- Distributed training core (TP/PP), uses core_v0.12.1 branch
    |
MindSpeed (Huawei)       <- Ascend adaptation layer, monkey-patches Megatron kernels
    |
MindSpeed-MM (Huawei)    <- Multimodal application layer: VLM/generation/omni-modal training

MindSpeed-MM shares the underlying dependency stack (CANN, torch_npu, MindSpeed, Megatron-LM) with MindSpeed-LLM, but targets multimodal scenarios at the application level (vision-language models, video generation, speech synthesis, etc.).

mindspeed-mm-env-setup

MindSpeed-MM Ascend NPU Base Environment Setup

Component Relationship

Quick Start -- 6 Steps to Complete the Base Environment