nemo-automodel-distributed-training

Installation
SKILL.md

Distributed Training in NeMo AutoModel

Purpose

NeMo AutoModel uses PyTorch-native distributed training. All parallelism is orchestrated through a single MeshContext object that holds device meshes, strategy configs, and axis names.

Instructions

For conceptual distributed-training questions, answer directly from the quick patterns in this skill without inspecting the repository. Start with the strategy choice, then list only the YAML fields and constraints relevant to the question.

Use direct action verbs in the final answer: recommend the strategy, show the minimal YAML, state the sizing constraint, and name the unsupported strategies. Do not discuss model onboarding, recipes, Slurm, SkyPilot, or checkpointing unless the user asks.

Installs
147
Repository
nvidia/skills
GitHub Stars
1.0K
First Seen
7 days ago
nemo-automodel-distributed-training — nvidia/skills