Writing LLM System Prompts

A well-built system prompt steers behavior across the whole conversation and — if structured for KV-cache reuse — costs a fraction of an uncached prefix on providers that offer prompt caching (Anthropic, OpenAI, Bedrock, Vertex, etc.).

Rules

Put the role first, in one sentence. A single descriptive sentence in the system field (not the user turn) anchors tone and vocabulary, e.g. You are a helpful coding assistant specializing in Python. Plain prose at the top — do not wrap in markup.
Order static-first, dynamic-last. Cache hits require a byte-identical prefix. Sequence: role → tools/policies → long reference docs / few-shot examples → cache breakpoint → dynamic context → user turn. Why: one changed token before the breakpoint drops you from cached-read pricing (~10% of input) to full input cost.
Mark cache breakpoints explicitly when the provider supports it. Most caching is opt-in (e.g. Anthropic cache_control, Bedrock cachePoint). Place a breakpoint at the end of each stable block to reuse. Cache writes typically cost more than base input; reads a fraction of it. TTLs vary (commonly ~5 min, sometimes longer tiers). Only cache prefixes you'll reuse within the TTL — sparse traffic pays the write penalty repeatedly.
```
# Anthropic example — adapt to your provider's syntax
system=[
    {"type": "text", "text": ROLE_AND_POLICIES},
    {"type": "text", "text": LONG_REFERENCE_DOC,
     "cache_control": {"type": "ephemeral"}},
]
```