context-window
Context Window
Coverage
The quantitative discipline behind an agent's working memory. Allocates the context-window budget across three zones: System (system prompt, rules, tool schemas), Skill Injection (the SKILL.md files auto-loaded for the current task), and Working (conversation, tool results, file contents, agent output). Names the three context health states — ok (< 60% used), compact (60–80%), exhausted (> 80%) — and the 80% compaction rule that compaction must always trigger before the budget is fully consumed, leaving 20% as the safety margin for finishing the current operation, writing the checkpoint, running the closeout protocol, and emitting the continuation signal. Specifies the pre-compact protocol (commit uncommitted changes, write the continuation signal, update the checkpoint, save state that cannot be re-derived from git or disk) and the post-compact recovery flow (re-injection of git status, active-task reference, recent commits, critical findings). Catalogs typical token consumption per operation type (full file read 20–40K, large tool-result JSON 10–30K, single SKILL injection 3–8K, fixed system overhead) and the five token-reduction techniques: deterministic-CLI over heavy MCP / tool-result paths, targeted file reads with offset + limit instead of full-file reads, search-before-read (grep first, read the match), progressive skill disclosure (small SKILL.md kept always loaded; large reference files loaded on demand), and count-mode for exploration (count matches, then read the few that matter). Specifies the cross-session persistence hierarchy — git history > files on disk > durable memory > live context — and uses it to decide what to checkpoint before compaction. Lists per-model-class context strategies for 1M, 200K, and 128K windows.
Philosophy
The context window is the agent's working memory. Unlike human memory, it has a hard ceiling — when it fills, information is permanently lost from the live session unless it has been checkpointed somewhere durable. Managing the window is not optional. It is the difference between completing a long task and crashing mid-work with the most recent reasoning gone.
The trap of large windows is the assumption that they are effectively unlimited. A 1M-token window feels infinite until a single 2000-line file read consumes 30K, three of those plus a long tool-result chain pushes past 200K, and the agent is at 60% before any real implementation has happened. The ceiling is real, and it is closer than the headline number suggests. Discipline at 200K is identical to discipline at 1M; only the absolute numbers move.
The 80% rule exists because compaction is itself an operation that needs budget. Hitting 100% mid-operation loses the operation. Compacting at 80% preserves it — the remaining 20% pays for the act of preserving.
Zone Model
A useful per-session mental partition of the available budget:
More from jacob-balslev/skill-graph-skills
ai-native-development
Use when reasoning about agent autonomy levels, designing auto-improve loops, evaluating AI-generated code quality, or measuring agent productivity in an LLM-assisted codebase. Covers Karpathy's three eras of software (1.0 explicit / 2.0 learned / 3.0 natural-language), the vibe-coding-vs-agentic-engineering distinction, the 0–5 autonomy slider with task-type recommendations, the one-asset / one-metric / one-time-box AutoResearch loop, Software 3.0 productivity metrics, and the documented quality regressions of ungated AI-generated code (the 'vibe hangover'). Do NOT use for choosing a specific autonomy-loop topology (use `agent-engineering`), for the per-prompt authoring discipline (use `prompt-craft`), or for reviewing the AI-generated code that comes out of a Software 3.0 workflow (use `code-review`).
4ideation
Use when generating a wide range of solution concepts before converging on a direction, running structured idea-generation sessions, breaking out of solution fixation, or moving from divergent to convergent selection with explicit criteria. Do NOT use for collaborative engineering domain discovery (event-storming), solo deep technical design, or making final go/no-go investment decisions — those require different methods.
4frontend-architecture
Use when organizing a frontend codebase — module boundaries, component layering, state ownership, data-flow direction, and the separation between feature code and shared primitives. Do NOT use for visual design decisions, specific framework migration tactics, or backend API contract design.
4color-system-design
Use when designing a color system — palette construction, semantic color tokens, WCAG contrast ratios, perceptual uniformity in OKLCH/LCH, and light/dark mode parity. Do NOT use for single brand-color picks, runtime theme-switching mechanics, or non-color design tokens.
4agent-engineering
Use when designing or evaluating a production AI agent system, choosing a multi-agent coordination pattern (orchestrator/worker, fan-out, consensus, sequential chain, evaluator/optimizer), diagnosing coordination failures (claim races, silent stalls, context contamination, runaway loops), or auditing whether an agent loop is truly production-ready. Covers the four pillars (architecture and lifecycle, task decomposition, coordination patterns, production reliability), the six reliability requirements (observability, cost budgets, idempotency, failure recovery, safety caps, claim locks), the delegation decision framework with overhead crossover, and the most common anti-patterns. Do NOT use for prompt wording (use `prompt-craft`), per-call tool efficiency (use `tool-call-strategy`), context-stack design within a single agent (use `context-engineering`), or runtime debugging of a deployed system (use `debugging`).
4form-ux-architecture
Use when designing or auditing form structure and validation UX: field grouping, required vs optional inputs, validation timing, client/server validation split, submission lifecycle, recovery, multi-step forms, and high-risk data entry. Do NOT use for labels and announcements alone (use `a11y`), validation-message wording (use `microcopy`), API schema design (use `api-design`), or stored data modeling (use `data-modeling`).
4