mindspeed-mm-vlm
Installation
SKILL.md
MindSpeed-MM VLM (Vision-Language Model) Training
This Skill guides users through training multimodal understanding (VLM) models on Huawei Ascend NPU using MindSpeed-MM. It uses Qwen2.5VL-3B as the flagship example and covers the end-to-end fine-tuning workflow.
Prerequisites
Critical: For most VLMs (Qwen2.5VL, Qwen2VL, InternVL, GLM4V, DeepSeekVL2), follow the manual install flow below. Do NOT use
bash scripts/install.sh— official MindSpeed-MM docs state it only fully supports Qwen3/Qwen3.5. (For Qwen3VL / Qwen3.5, use one-click install +bash examples/qwen3_5/install_extensions.sh.)
Step P1: Clone repositories
git clone https://gitcode.com/Ascend/MindSpeed-MM.git /root/workspace/MindSpeed-MM
git clone https://github.com/NVIDIA/Megatron-LM.git /root/workspace/Megatron-LM
cd /root/workspace/Megatron-LM && git checkout core_v0.12.1
cp -r megatron /root/workspace/MindSpeed-MM/
cd /root/workspace/MindSpeed-MM