run-on-slurm
Installation
SKILL.md
Run Megatron-LM on SLURM
Prerequisites
- A SLURM cluster login with submission rights to a GPU partition.
- Megatron-LM checked out on a filesystem visible to all nodes in the allocation (NFS, Lustre, or similar). All nodes must reach the same paths for code, data, checkpoints, and output.
uvinstalled; runuv sync --extra training --extra dev(or--extra lts) on the worktree once before submission so the.venvis materialized and visible to every node.
Minimal sbatch script
Save as run_megatron.slurm in the worktree: