slurm
Installation
SKILL.md
Slurm Cluster Management
Help developers submit, manage, and troubleshoot GPU-accelerated workloads on SRP's Slurm clusters. Supports training, inference, and data processing jobs using Apptainer containers.
When to Use This Skill
Use this skill when:
- Submitting GPU training or inference jobs to Slurm clusters
- Managing running or queued jobs
- Monitoring cluster resources and job status
- Debugging job failures or performance issues
- Writing Slurm job scripts with Apptainer containers
- Checking GPU availability and utilization
SRP Slurm Clusters
Oracle OKE Cluster (H100 GPUs)
SSH Access:
Related skills
More from serendipityoneinc/srp-claude-code-marketplace
lark-docs
Access and search Lark/Feishu cloud documents with user permissions (飞书云文档权限访问)
27lark-messages
Access Lark/Feishu messages and groups with user permissions (飞书消息与群组访问)
9mac-setup
Mac development environment setup and verification for SRP employees (Mac 开发环境安装配置与验证)
8raydata
Write, test, deploy, and troubleshoot Ray Data jobs for large-scale data processing with GPU/CPU coordination
6k8s-management
Kubernetes cluster management and monitoring (Kubernetes集群管理与监控)
6github-integration
GitHub integration for code review, PR management, and issue tracking (GitHub集成:代码审查、PR管理、Issue追踪)
6