nemo-mbridge-perf-moe-vlm-training

Installation
SKILL.md

MoE VLM Training

Stable docs: @docs/training/moe-optimization.md Card: @skills/nemo-mbridge-perf-moe-vlm-training/card.yaml

FSDP vs 3D Parallel

Approach Strength Best fit
FSDP Simplest path to a working multimodal run first bring-up, memory-first tuning, awkward PP boundaries
3D parallel Higher ceiling after tuning stable models with a clean PP layout and time for deeper sweeps

For MoE VLMs, the practical workflow is usually:

  1. get the first reliable run with FSDP
  2. stabilize real-data input, recompute, and memory behavior
  3. move to 3D parallel only if the throughput headroom is worth the extra work

Rounded Findings From Recent VLM Runs

Installs
134
Repository
nvidia/skills
GitHub Stars
1.0K
First Seen
7 days ago
nemo-mbridge-perf-moe-vlm-training — nvidia/skills