nemo-mbridge-perf-moe-optimization-workflow

Installation
SKILL.md

MoE Training Optimization Workflow

Stable docs: @docs/training/moe-optimization.md Card: @skills/nemo-mbridge-perf-moe-optimization-workflow/card.yaml Source: Scalable Training of MoE Models with Megatron Core

Quick Reference

Think in terms of the paper's Three Walls:

  • memory wall
  • communication wall
  • compute and host-overhead wall

MoE tuning is iterative. Fixing one wall usually exposes the next one, so the best workflow is: fit first, scale second, profile third, then retune.

First Answer Checklist

Installs
135
Repository
nvidia/skills
GitHub Stars
1.0K
First Seen
7 days ago
nemo-mbridge-perf-moe-optimization-workflow — nvidia/skills