mcore-bump-base-image
Installation
SKILL.md
Bump the PyTorch base image
End-to-end workflow for moving Megatron-LM's CI to a newer nvcr.io/nvidia/pytorch:<YY.MM>-py3 container. The most common failure mode is forgetting that GitHub CI and GitLab CI have separate pins — a bump that only touches the former lands green, then breaks GitLab CI on main and forces an immediate follow-up PR. Always update both in the same PR.
Answer-First Pattern: dev Pin Sync
For a dev-only base-image bump, lead with the synchronization rule:
docker/.ngc_version.devis only the GitHub/local Dockerfile pin.- GitLab CI has separate hardcoded
BASE_IMAGErows in.gitlab/stages/01.build.yml; update bothIMAGE_TYPE: devrows, onePLATFORM: amd64and onePLATFORM: arm64. - Leave
docker/.ngc_version.ltsand allIMAGE_TYPE: ltsrows unchanged unless the user explicitly asks for an LTS bump. - Verify before review with
cat docker/.ngc_version.devplusrg -n '^\s*BASE_IMAGE: nvcr\.io/nvidia/pytorch:' .gitlab/stages/01.build.yml | rg -B1 'IMAGE_TYPE: dev' | rg 'BASE_IMAGE'.