debug
Installation
SKILL.md
Debug: evidence-before-action investigation
The most expensive class of mistake in ML debugging is asserting a cause based on plausibility, then attempting a "fix" that masks the real problem. This skill enforces the discipline of probe → hypothesis → smoke → controls → claim, in that order.
The agentic Stop hook routes here from reason when an assistant claims a cause without backing tool output.
When to run
The user just said any of:
- "why is X failing / diverging / NaN / OOM / hung / slow / crashed"
- "the loss is going up", "metrics look weird", "GPU util is 0"
- "debug this", "diagnose", "troubleshoot", "investigate this run"
- pasted a log excerpt asking what's wrong