build-and-dependency

Installation
SKILL.md

Build & Dependency Guide

The core principle: build and develop inside containers — the CI container ships the correct CUDA toolkit, PyTorch build, and pre-compiled native extensions (TransformerEngine, DeepEP, …) that cannot be reproduced on a bare host.


Why Containers

Megatron-LM depends on CUDA, NCCL, PyTorch with GPU support, TransformerEngine, and optional components like ModelOpt and DeepEP. Installing these on a bare host is fragile and hard to reproduce. The project ships Dockerfiles that pin every dependency.

Use the container as your development environment. This guarantees:

  • Identical CUDA / NCCL / cuDNN versions across all developers and CI.
  • uv.lock resolves the same way locally and in CI.
Related skills
Installs
2
GitHub Stars
16.2K
First Seen
Apr 19, 2026