skypilot
Installation
SKILL.md
SkyPilot Skill
SkyPilot is a unified framework to run AI workloads on any cloud, Slurm or Kubernetes. It provides a single interface to launch clusters, run jobs, and serve models across 25+ clouds (AWS, GCP, Azure, Coreweave, Nebius, Lambda, Together AI, RunPod, and more), Kubernetes clusters, and Slurm clusters.
When to Use SkyPilot
Use SkyPilot when you need to:
- Manage compute resources on any cloud, Slurm, or Kubernetes cluster
- Launch CPU/GPU/TPU (GB300, GB200, B200, H200, H100, etc.) on any cloud, Kubernetes or Slurm
- Run training, fine-tuning, or batch inference jobs
- Serve models with autoscaling and multi-cloud replicas (SkyServe)
- Run long-running jobs with automatic lifecycle management and recovery (managed jobs)
- Find the cheapest or most available GPU across clouds
Don't use SkyPilot for:
- Local-only workloads (use Docker/conda directly)