context-management
Context Management
Coverage
The working discipline that controls what enters, stays in, and exits an active agent session. Intake triage that sorts every candidate context source into a four-bucket classification (must-have / useful soon / durable background / noise) before any large file is read. The six-step context-management loop: state the active question in one sentence, name the minimum evidence needed to answer it, load the cheapest sources first (index → search → narrow file slice), collapse confirmed facts into a checkpoint, drop disproven assumptions from the active thread, re-check whether the question changed before reading more. Working-set shaping rules — what to keep active vs what to push out — and the distillation pattern that converts a 300-line log into a 2-line summary, a whole file into a function name plus slice plus invariant, a long conversation into current-state-blocker-next-step. Drift detection signals (re-reading the same file, ideas changing every turn, search-space unbounded, the agent forgetting what was proven) and the anti-drift rules (one active hypothesis at a time, one primary question, one verification target). The compaction-ready handoff format with five required fields (task / question / proven facts / rejected paths / next step) and the under-thirty-seconds resume test. The selective-rebuild recipe for recovering after the thread is lost.
Philosophy
Context management is the practical layer between having the right information available somewhere in the workspace and having it active in the agent at the right moment. The goal is not to load more context — it is to keep the smallest working set that still lets the agent act correctly. Without this discipline, agents speculate from stale assumptions, re-read files they already processed, and lose the decision trail at the moment of compaction. Every context slot occupied by noise is a slot unavailable for the evidence that would actually resolve the current question.
The hardest part is not what to load. It is what to drop. Disproven hypotheses, raw logs after the key pattern is extracted, full files after the needed lines are identified, alternative hypotheses that have already been falsified — all of these continue to occupy context until they are deliberately removed. The working set is what the agent is actively reasoning over, not everything it has ever seen.
1. Outcomes
More from jacob-balslev/skill-graph-skills
ai-native-development
Use when reasoning about agent autonomy levels, designing auto-improve loops, evaluating AI-generated code quality, or measuring agent productivity in an LLM-assisted codebase. Covers Karpathy's three eras of software (1.0 explicit / 2.0 learned / 3.0 natural-language), the vibe-coding-vs-agentic-engineering distinction, the 0–5 autonomy slider with task-type recommendations, the one-asset / one-metric / one-time-box AutoResearch loop, Software 3.0 productivity metrics, and the documented quality regressions of ungated AI-generated code (the 'vibe hangover'). Do NOT use for choosing a specific autonomy-loop topology (use `agent-engineering`), for the per-prompt authoring discipline (use `prompt-craft`), or for reviewing the AI-generated code that comes out of a Software 3.0 workflow (use `code-review`).
4ideation
Use when generating a wide range of solution concepts before converging on a direction, running structured idea-generation sessions, breaking out of solution fixation, or moving from divergent to convergent selection with explicit criteria. Do NOT use for collaborative engineering domain discovery (event-storming), solo deep technical design, or making final go/no-go investment decisions — those require different methods.
4frontend-architecture
Use when organizing a frontend codebase — module boundaries, component layering, state ownership, data-flow direction, and the separation between feature code and shared primitives. Do NOT use for visual design decisions, specific framework migration tactics, or backend API contract design.
4color-system-design
Use when designing a color system — palette construction, semantic color tokens, WCAG contrast ratios, perceptual uniformity in OKLCH/LCH, and light/dark mode parity. Do NOT use for single brand-color picks, runtime theme-switching mechanics, or non-color design tokens.
4agent-engineering
Use when designing or evaluating a production AI agent system, choosing a multi-agent coordination pattern (orchestrator/worker, fan-out, consensus, sequential chain, evaluator/optimizer), diagnosing coordination failures (claim races, silent stalls, context contamination, runaway loops), or auditing whether an agent loop is truly production-ready. Covers the four pillars (architecture and lifecycle, task decomposition, coordination patterns, production reliability), the six reliability requirements (observability, cost budgets, idempotency, failure recovery, safety caps, claim locks), the delegation decision framework with overhead crossover, and the most common anti-patterns. Do NOT use for prompt wording (use `prompt-craft`), per-call tool efficiency (use `tool-call-strategy`), context-stack design within a single agent (use `context-engineering`), or runtime debugging of a deployed system (use `debugging`).
4form-ux-architecture
Use when designing or auditing form structure and validation UX: field grouping, required vs optional inputs, validation timing, client/server validation split, submission lifecycle, recovery, multi-step forms, and high-risk data entry. Do NOT use for labels and announcements alone (use `a11y`), validation-message wording (use `microcopy`), API schema design (use `api-design`), or stored data modeling (use `data-modeling`).
4