tpu-perf-model
TPU Performance Model
Theoretical performance modeling tool for TPU v7x centered on the Register ↔ VMEM ↔ HBM data flow hierarchy.
When to Use
- Before writing a Pallas kernel: predict theoretical performance, identify bottleneck, guide tiling
- After profiling a kernel: compare theoretical vs measured to find optimization opportunities
TPU v7x Hardware Quick Reference
More from primatrix/skills
linear
Manage issues, projects & team workflows in Linear. Use when the user wants to read, create or updates tickets in Linear.
13exec-remote
Executes Python scripts, tests, or benchmarks on a provisioned remote cluster (GPU or TPU) using SkyPilot. Use this skill when the user asks to run code on GPU, TPU, or any "remote" cluster.
12session-recorder
Records the complete session content and logs it to a daily work directory with a dynamic filename based on the active CLI agent. Use this for automated progress tracking and documentation.
10lint-fix
Check and fix lint issues for changed Python files. Supports single commit, commit range, and unstaged/staged working tree changes. Use when the user wants to verify or fix lint compliance.
2gke-tpu
Manage GKE-based TPU workloads — create pods/jobs via kubectl, sync code, and run multi-process benchmarks. Use when the user wants to create/manage/run TPU workloads on GKE. Reads config from gke.toml in the current working directory.
1beaver-engine
Internal engine for Beaver commands. DO NOT trigger directly. Provides state machine rules, guardrail checks, field operations, and project config reading used by beaver-create, beaver-design, beaver-decompose, beaver-dev, beaver-pr, beaver-tracker, beaver-focus, and beaver-setup.
1