MoE Training Optimization Workflow

Stable docs: @docs/training/moe-optimization.md Card: @skills/nemo-mbridge-perf-moe-optimization-workflow/card.yaml Source: Scalable Training of MoE Models with Megatron Core

Quick Reference

Think in terms of the paper's Three Walls:

MoE tuning is iterative. Fixing one wall usually exposes the next one, so the best workflow is: fit first, scale second, profile third, then retune.

Installs

1.6K

Repository

GitHub Stars

2.6K

First Seen

May 29, 2026

Security Audits