hyperpod-issue-report

Installation
SKILL.md

HyperPod Issue Report

Collect diagnostic logs from HyperPod cluster nodes via SSM, store results in S3. Supports both EKS and Slurm clusters with auto-detection. Uses the bundled scripts/hyperpod_issue_report.py for reliable parallel collection.

Prerequisites

  • AWS CLI configured with permissions: sagemaker:DescribeCluster, sagemaker:ListClusterNodes, ssm:StartSession, s3:PutObject, s3:GetObject, eks:DescribeCluster
  • Python 3.8+ and uv (see uv installation docs for install options)
  • SSM Agent running on target nodes; node IAM roles need s3:GetObject/s3:PutObject on the report bucket
  • For EKS clusters: kubectl installed and configured (see Workflow step 2)

Workflow

1. Gather Information

Collect from the user:

Related skills

More from awslabs/agent-plugins

Installs
48
GitHub Stars
684
First Seen
Apr 1, 2026