deploy-cluster
Deploy SkyPilot TPU Cluster on GKE
This skill deploys a SkyPilot-managed TPU cluster on an existing GKE cluster. It builds on the apply-resource skill which handles GKE cluster creation via xpk.
Key Feature: Each TPU type gets its own SkyPilot cluster (named <cluster>-<username>-<tpu_type>), allowing multiple topologies to run in parallel on the same GKE cluster. Node pools are automatically managed per TPU type.
Prerequisites
- SkyPilot:
pip install skypilot- Check:
sky --help
- Check:
- Google Cloud SDK (gcloud): Install guide
- Run
gcloud auth loginto authenticate
- Run
- Kubectl: Install guide
Defaults
The following defaults apply unless the user explicitly overrides them:
More from primatrix/skills
linear
Manage issues, projects & team workflows in Linear. Use when the user wants to read, create or updates tickets in Linear.
13exec-remote
Executes Python scripts, tests, or benchmarks on a provisioned remote cluster (GPU or TPU) using SkyPilot. Use this skill when the user asks to run code on GPU, TPU, or any "remote" cluster.
12session-recorder
Records the complete session content and logs it to a daily work directory with a dynamic filename based on the active CLI agent. Use this for automated progress tracking and documentation.
10lint-fix
Check and fix lint issues for changed Python files. Supports single commit, commit range, and unstaged/staged working tree changes. Use when the user wants to verify or fix lint compliance.
2gke-tpu
Manage GKE-based TPU workloads — create pods/jobs via kubectl, sync code, and run multi-process benchmarks. Use when the user wants to create/manage/run TPU workloads on GKE. Reads config from gke.toml in the current working directory.
1tpu-perf-model
Use when analyzing theoretical TPU v7x performance for a mathematical formula or comparing kernel performance against theoretical bounds. Trigger when the user asks about TPU performance modeling, roofline analysis, data flow optimization, or tiling strategy.
1