nemo-automodel-distributed-training
Installation
SKILL.md
Distributed Training in NeMo AutoModel
Purpose
NeMo AutoModel uses PyTorch-native distributed training.
All parallelism is orchestrated through a single MeshContext object that
holds device meshes, strategy configs, and axis names.
Instructions
For conceptual distributed-training questions, answer directly from the quick patterns in this skill without inspecting the repository. Start with the strategy choice, then list only the YAML fields and constraints relevant to the question.
Use direct action verbs in the final answer: recommend the strategy, show the minimal YAML, state the sizing constraint, and name the unsupported strategies. Do not discuss model onboarding, recipes, Slurm, SkyPilot, or checkpointing unless the user asks.