Fixing flaky tests

Three non-negotiables, in order:

Reproduce before you fix. A fix for a failure you never observed is a guess. Only fall back to analytical fixes when the escalation ladder below is exhausted.
Fix the root cause. Sleeps, raised timeouts, retries, and weakened assertions hide flakes; they do not fix them.
Validate with an N-run loop. One green run proves nothing about an intermittent failure. Size N to the observed failure rate.

Before any of these: measure, don't assume. Flaky-vs-deterministic, and the rate, are facts to establish from verifiable GitHub run data (step 1) — never inherited from a Slack alert, a teammate's guess, or a ci:insights label.

For triaging a red CI run (finding and classifying the failure), use the debugging-ci-failures skill first — this skill takes over once the failure is classified as a flaky test. For writing new Playwright tests that aren't flaky, use the playwright-test skill.

1. Measure the failure rate — from GitHub, not from a digest

The GitHub Actions API (or GitHub MCP) is the source of truth. hogli ci:insights is a digest, not an oracle — it can mislabel flaky-vs-deterministic, misstate the rate, and lag the API. Use it to validate a hypothesis or pull historical context, never as the first move or the classification authority.

fixing-flaky-tests

Fixing flaky tests

1. Measure the failure rate — from GitHub, not from a digest