Context Optimization

Context optimization is the process of refining the raw context assembled for an AI model so that every token contributes meaningfully to the task. In a typical RAG or agent pipeline, the retrieved context often contains redundant passages, marginally relevant chunks, and poorly ordered information. Optimization transforms this raw material into a lean, high-signal context block that improves answer quality, reduces inference cost, and makes the most of the model's attention budget.

Workflow

Audit the Raw Context: Inventory every piece of context that has been gathered -- retrieved documents, conversation history, tool outputs, and metadata. Measure the total token count and compare it against the available context budget. Identify the compression ratio needed if the raw context exceeds the budget.
Deduplicate Overlapping Content: Scan the context for near-duplicate passages that convey the same information. This is common in RAG pipelines where chunking with overlap produces multiple chunks covering the same paragraph, or when multiple source documents repeat the same facts. Use semantic similarity (cosine distance > 0.92) or exact n-gram overlap detection to identify duplicates, then keep only the most complete version of each piece of information.
Score Relevance and Information Density: Assign each context chunk two scores: a relevance score (how closely it relates to the current query) and an information density score (how many useful facts it conveys per token). Relevance can be measured via the retrieval score or a lightweight cross-encoder pass. Density can be estimated by counting named entities, code identifiers, numerical data, and key terms relative to chunk length. Multiply the two scores to produce a composite utility score.
Filter Low-Value Content: Remove chunks whose composite utility score falls below a threshold. A good starting point is to keep the top 60-70% of chunks by utility score. Also remove boilerplate text (copyright notices, navigation menus, repeated headers) that contributes zero information. Be conservative -- it is better to include a marginally relevant chunk than to lose a critical fact.
Reorder by Priority: Arrange the remaining chunks to maximize the model's attention. Place the highest-utility chunks first (models attend most to the beginning of the context) and the second-highest near the end (models also attend to recency). Avoid burying critical information in the middle of a long context block -- this is the "lost in the middle" zone where model attention is weakest.
Validate Coverage: After filtering and reordering, verify that the optimized context still covers all aspects of the query. If the query has multiple sub-questions, ensure at least one chunk addresses each. If coverage gaps appear, selectively re-add previously filtered chunks that fill the gap, even if their utility score was below the threshold.

context-optimization

Context Optimization

Workflow

Techniques

More from seb1n/awesome-ai-agent-skills

summarization

note-taking

proofreading

knowledge-graph-creation

data-analysis

data-visualization