fixing-flaky-tests
Fixing flaky tests
Three non-negotiables, in order:
- Reproduce before you fix. A fix for a failure you never observed is a guess. Only fall back to analytical fixes when the escalation ladder below is exhausted.
- Fix the root cause. Sleeps, raised timeouts, retries, and weakened assertions hide flakes; they do not fix them.
- Validate with an N-run loop. One green run proves nothing about an intermittent failure. Size N to the observed failure rate.
Before any of these: measure, don't assume. Flaky-vs-deterministic, and the rate, are facts to establish from verifiable GitHub run data (step 1) — never inherited from a Slack alert, a teammate's guess, or a ci:insights label.
For triaging a red CI run (finding and classifying the failure), use the debugging-ci-failures skill first — this skill takes over once the failure is classified as a flaky test.
For writing new Playwright tests that aren't flaky, use the playwright-test skill.
1. Measure the failure rate — from GitHub, not from a digest
The GitHub Actions API (or GitHub MCP) is the source of truth. hogli ci:insights is a digest, not an oracle — it can mislabel flaky-vs-deterministic, misstate the rate, and lag the API. Use it to validate a hypothesis or pull historical context, never as the first move or the classification authority.