nemo-mbridge-perf-moe-optimization-workflow
Installation
SKILL.md
MoE Training Optimization Workflow
Stable docs: @docs/training/moe-optimization.md Card: @skills/nemo-mbridge-perf-moe-optimization-workflow/card.yaml Source: Scalable Training of MoE Models with Megatron Core
Quick Reference
Think in terms of the paper's Three Walls:
- memory wall
- communication wall
- compute and host-overhead wall
MoE tuning is iterative. Fixing one wall usually exposes the next one, so the best workflow is: fit first, scale second, profile third, then retune.