mlm-bridge-training

Installation
SKILL.md

MLM vs Bridge Training

For how they differ, the arg mapping tables, gotchas, and translation script, see:

  • docs/megatron-lm-to-megatron-bridge.md

Correlation Testing

Use vanilla_gpt_pretrain_config for loss-correlation testing. This recipe uses bare GPTModelProvider defaults (LayerNorm, GeLU, learned_absolute position embeddings, vocab_size inherited from tokenizer) — matching MLM pretrain_gpt.py defaults with no args.

MLM Correlation Run (2L/256H, 1 GPU)

PYTHONPATH=3rdparty/Megatron-LM:$PYTHONPATH \
uv run python -m torch.distributed.run --nproc_per_node=1 \
  3rdparty/Megatron-LM/pretrain_gpt.py \
Related skills

More from nvidia-nemo/megatron-bridge

Installs
1
GitHub Stars
577
First Seen
Apr 19, 2026