context-optimization
Context Optimization Techniques
Context optimization extends the effective capacity of limited context windows through strategic compression, masking, caching, and partitioning. Effective optimization can double or triple effective context capacity without requiring larger models or longer windows — but only when applied with discipline. The techniques below are ordered by impact and risk.
When to Activate
Activate this skill when:
- Context limits constrain task complexity
- Optimizing for cost reduction (fewer tokens = lower costs)
- Reducing latency for long conversations
- Implementing long-running agent systems
- Needing to handle larger documents or conversations
- Building production systems at scale
Core Concepts
Apply four primary strategies in this priority order:
- KV-cache optimization — Reorder and stabilize prompt structure so the inference engine reuses cached Key/Value tensors. This is the cheapest optimization: zero quality risk, immediate cost and latency savings. Apply it first and unconditionally.
More from flora131/atomic
research-codebase
Document codebase as-is with research directory for historical context
180explain-code
Explain code functionality in detail.
176prompt-engineer
Create, improve, or optimize prompts using best practices
170gh-create-pr
Commit unstaged changes, push changes, submit a pull request.
169gh-commit
Create well-formatted commits with conventional commit format.
168context-compression
This skill should be used when the user asks to "compress context", "summarize conversation history", "implement compaction", "reduce token usage", or mentions context compression, structured summarization, tokens-per-task optimization, or long-running agent sessions exceeding context limits. A core context engineering skill — also activates when the user mentions "context engineering" or "context-engineering" in the context of managing token budgets and session longevity.
168