experiment-queue
Installation
SKILL.md
Experiment Queue
Orchestrate large batches of ML experiments on SSH remote GPU servers with proper state tracking, OOM retry, stale cleanup, and wave transitions.
When to Use This Skill
Use when /run-experiment is insufficient:
- ≥10 jobs that need batching across GPUs
- Multi-seed sweeps (e.g., 21 seeds × 12 cells)
- Wave transitions (run wave 1, wait, run wave 2, wait, run wave 3...)
- Teacher+student chains (train teacher then distill; auto-trigger student after teacher done)
- OOM-prone configs where you need to retry with different GPU or wait
- Mixed seed grids where failed cells need re-running
Do NOT use for:
- Single ad-hoc experiment (use
/run-experiment) - Modal/Vast.ai deployments (those have their own orchestration)
- Experiments that need manual inspection between runs