experimental-design
Installation
SKILL.md
Experimental Design Best Practice
- ALWAYS include meaningful baselines (not just random):
- At least one classical method baseline
- At least one recent SOTA method baseline
- A simple-but-strong baseline (e.g., linear probe, k-NN)
- Use MULTIPLE random seeds (minimum 3, ideally 5)
- Report mean +/- std across seeds
- Design ablations that isolate EACH key component:
- Remove one component at a time
- Each ablation must be meaningfully different from baseline
- Control variables: change only ONE thing per comparison
- Use standard splits (train/val/test) — never test on training data
- Report wall-clock time and memory usage alongside accuracy