cache-money
Cache Money
Keep the Anthropic prompt cache warm during Claude Code sessions — especially during peak hours when usage limits are tighter — by scheduling lightweight pings tuned to your cache TTL.
Why This Matters
Claude Code sends the full conversation context with every API call. Anthropic caches this prefix server-side and serves subsequent calls from cache at ~10% of the base input price. But the cache expires after a TTL period of inactivity:
| TTL Tier | Duration | Cache Write Cost | Who Gets It |
|---|---|---|---|
| Default | 5 minutes | 1.25x base input | All plans (no ttl field specified) |
| Extended | 1 hour | 2x base input | Max-tier plans (server-side in Claude Code), or explicit ttl: "1h" via API |
Cache reads cost 0.1x base input regardless of TTL tier — that's the 90% saving.
If a session sits idle past its TTL, the next call pays full cache-write price for the entire context — up to 1M tokens. During peak hours (weekdays 5:00 AM – 11:00 AM PT), Anthropic's rolling session limits are consumed faster, making every cache miss doubly expensive: higher rebuild cost plus faster quota burn.
For detailed technical background, consult references/cache-mechanics.md.