Latent Briefing and KV Cache Memory Sharing

Hierarchical multi-agent systems often pay for the same context twice. The orchestrator accumulates a long reasoning trajectory, but each worker usually receives only a narrow text handoff such as a subtask prompt plus raw document slices. Passing the full trajectory fixes coverage but drives token cost up on every worker call. Summarization introduces latency and information loss. Retrieval helps with document access but does not preserve the orchestrator's evolving reasoning state.

Latent Briefing addresses this by sharing memory at the representation level rather than the text level. The core idea is to compact the orchestrator trajectory in the worker model's KV cache, keeping positions that are most relevant to the current worker task. The method builds on Attention Matching (AM) KV cache compaction and adapts it for inference-time multi-agent handoff with task-guided queries, a shared token mask across heads, and robust thresholding.

When to Activate

Activate this skill when:

Designing orchestrator-worker or supervisor-specialist systems where workers need access to prior orchestrator state without replaying the full trajectory as text
Evaluating alternatives to LLM summarization or RAG for cross-agent state transfer
Implementing or studying KV cache compaction as a first-class inference primitive, not only prefix caching of identical prompts
Debugging token explosion in recursive, hierarchical, or tool-heavy agent graphs
Interpreting benchmarks that report worker-token savings, total-token savings, compaction overhead, and accuracy together

Core Concepts

The token explosion pattern. In recursive or REPL-style systems, the orchestrator repeatedly calls a worker to inspect evidence, verify hypotheses, or answer subquestions. The orchestrator's trajectory grows with partial conclusions, dead ends, tool output, and prior worker responses. If that trajectory is passed in full on every worker call, cost compounds quickly.

latent-briefing

Latent Briefing and KV Cache Memory Sharing

When to Activate

Core Concepts

More from guanyang/antigravity-skills

filesystem-context

memory-systems

ui-ux-pro-max

frontend-design

pptx

xlsx