create-issue
Triage CI Failure into a GitHub Issue
Investigate a failing GitHub Actions job, extract the root cause, and file a
well-structured bug issue against NVIDIA/Megatron-LM.
Workflow
1. Parse the URL
The argument is a GitHub Actions URL. It will be one of:
- Job URL:
https://github.com/<owner>/<repo>/actions/runs/<run_id>/job/<job_id> - Run URL:
https://github.com/<owner>/<repo>/actions/runs/<run_id>
Extract run_id and, if present, job_id.
2. Identify failed jobs
- If a
job_idwas provided, use that job directly.
More from nvidia/megatron-lm
split-pr
Split a PR into multiple PRs to reduce the number of required CODEOWNERS reviewer groups.
2respond-to-issue
Research and draft a response to a GitHub issue or question from an external contributor.
2build-and-dependency
Container-based dev environment setup and dependency management for Megatron-LM. Covers acquiring and launching the CI container, uv package management, and updating uv.lock.
2onboard-gb200-1node-tests
Onboard 1-node GitHub MR functional tests for GB200 from existing mr-scoped 2-node tests.
2cicd
CI/CD reference for Megatron-LM. Covers CI pipeline structure, PR scope labels, triggering internal GitLab CI, and CI failure investigation.
1testing
Test system for Megatron-LM. Covers test layout, recipe YAML structure, adding and running unit and functional tests, golden values, marker filters, and CI parity.
1