prompt-caching
Prompt Caching for Agentic AI Applications
Prompt caching is the single most impactful optimization for agentic AI systems. In Claude Code, prompt caching reduced costs by ~90% by ensuring the vast majority of tokens are cache reads rather than cache writes or uncached input.
Core Principle: Prefix Matching
Prompt caching works on exact prefix matching. The API caches the longest prefix of your request that matches a previous request. Any change at position N invalidates the cache for everything after position N.
Request layout (order matters):
[System Prompt] → [Tool Definitions] → [System Reminders] → [Conversation Messages]
The system prompt and tool definitions form the stable prefix. Conversation messages are the dynamic suffix that grows with each turn. This layout maximizes cache hits because the stable prefix is identical across every request in a session.
Key insight: You are NOT paying for the full context on every turn. With caching, you pay full price once (cache write), then ~10% for every subsequent read. The longer your stable prefix, the more you save.