staff-engineering-skills-thundering-herd
Installation
SKILL.md
Thundering Herd Trap
The cache expired. Ten thousand requests hit the database at once. Before writing any cache-miss handler, cron schedule, or reconnection logic, ask: what happens when every client does this at the same time?
The Four Triggers
| Trigger | What happens | Example |
|---|---|---|
| Cache expiry | Popular key expires, all concurrent requests miss and fetch simultaneously | Homepage feed cached for 5 min; on expiry, 10,000 requests hit the DB at once |
| Service recovery | Service comes back up, all queued retries hit it simultaneously | Downstream API recovers from outage, backlogged clients all retry at once |
| Synchronized scheduling | Every instance fires at the same second | Cron at 0 * * * * on 50 instances = 50 identical queries at :00 |
| Connection storm | All clients reconnect simultaneously after a failure | Database failover: 500 app servers all connect to the new primary at once |
Detection: When You're Creating a Thundering Herd
Stop and fix if you see:
- "Check cache, miss, fetch, store" with no coalescing -- the textbook pattern. When 1,000 requests hit the miss simultaneously, 1,000 identical queries hit the database. Only one fetch is needed; the other 999 should wait for its result.