nemo-mbridge-perf-expert-parallel-overlap
Installation
SKILL.md
MoE Expert-Parallel Overlap Skill
References
- Stable docs: @docs/training/communication-overlap.md
- Structured metadata: @skills/nemo-mbridge-perf-expert-parallel-overlap/card.yaml
What It Is
Expert-parallel (EP) overlap hides the cost of token dispatch/combine all-to-all
communication by running it concurrently with expert FFN compute. Optionally,
delayed expert weight-gradient computation (delay_wgrad_compute) provides
additional overlap by deferring wgrad to overlap with the next layer's forward.
Bridge supports two dispatcher paths: