context-engineering
Context Engineering
Coverage
- Core principle: the model is a reasoning engine that reasons over whatever is in its context window — wrong context produces correct reasoning over false premises
- The five-layer context stack: system prompt, persistent memory, always-loaded rules, injected skills, agent prompt — what each layer does and how each can fail
- The four context failure modes: missing, stale, wrong, overwhelming — diagnostic questions for each, table of symptoms, and prevention strategies
- Four context quality metrics: injection precision, injection recall, context utilization, freshness score — definitions, healthy ranges, and how to measure each
- Frequent Intentional Compaction (FIC): proactive compaction at task boundaries, target utilization range, and the difference between planned and forced compaction
- Subagent delegation pattern: when to delegate context-heavy investigation to a subagent so the main agent receives a summary instead of raw evidence
- Debugging decision tree: how to diagnose any agent failure by walking from missing-context through overwhelming-context before blaming the model
- The verification checklist: gates a context-engineering review must pass before declaring the pipeline healthy
Philosophy
The model is a reasoning engine that reasons over whatever is in its context window. If the context is wrong, the reasoning is correct but the conclusion is wrong. This means most agent failures are context failures, not model failures.
Without this discipline, teams blame the model for mistakes caused by missing keywords, stale skill content, or an overwhelmed window. Context engineering provides the diagnostic framework to identify why an agent produced a wrong answer and the design principles to prevent recurrence. It treats the context window as a deliberate design surface — not a dumping ground — so that the model's native reasoning produces the correct output without heroic prompting.
More from jacob-balslev/skill-graph
a11y
Use when building or reviewing interactive UI, forms, navigation, or dynamic content. Covers semantic HTML, keyboard access, focus management, labeling, state-change announcement, and reduced-motion / high-contrast preferences. Do NOT use for color-palette creation, visual branding, feedback-state staging, or prose reading-level accessibility - those belong to `visual-design-foundations`, `interaction-feedback`, and documentation respectively.
7intent-recognition
Use BEFORE any tool call that could modify state, touch sensitive targets, rewrite history, install dependencies, publish packages, or expose credentials/environment data. Classifies intent into Passive/Read, Reconnaissance, Modification, or Destructive/Irreversible using operation type plus target sensitivity, then runs Identify / Confirm / Verify before action. Do NOT use for deciding what code to write, executing already-classified work, reactive post-execution guardrails, or defining upstream governance policy.
6dependency-architecture
Use when designing or auditing dependency structure: package boundaries, runtime vs build dependencies, adapter layers, duplicate-purpose libraries, supply-chain risk, upgrade policy, lock-in, and dependency graph health. Do NOT use for choosing a major framework (use `framework-fit-analysis`), vulnerability-only review (use `owasp-security`), or routine refactoring without dependency boundary changes (use `refactor`).
6information-architecture
Use when structuring information for findability: navigation, page hierarchy, docs architecture, sitemap shape, labeling systems, wayfinding, and content grouping. Do NOT use for formal category-governance work (use `taxonomy-design`), responsive page composition (use `layout-composition`), component/token architecture (use `design-system-architecture`), or sentence-level UI text (use `microcopy`).
6design-thinking
Use when orchestrating a full human-centered design process across discovery, definition, ideation, prototyping, and testing — when uncertain which stage of the arc a team is in, when deciding whether to loop back, or when routing to the right stage-specific sibling skill. Do NOT use for single-stage execution (go directly to problem-framing, user-research, research-synthesis, journey-mapping, ideation, prototyping, or usability-testing) or for engineering domain discovery (use event-storming).
6knowledge-modeling
Use when deciding *which representation paradigm* fits a piece of domain knowledge — knowledge graph vs frames vs production rules vs semantic network vs concept map vs procedural ontology vs hybrid — when designing AI-agent context systems, building a knowledge base, structuring a skill or reference library, or planning a GraphRAG retrieval pipeline. Covers the seven paradigms with structure / best-for / weakness tables, the tacit-to-explicit knowledge acquisition pipeline (elicitation → articulation → formalization → validation → encoding), knowledge graph design principles (reify when needed, separate schema from instance, label precisely, bidirectional naming, minimal redundancy), the four knowledge-validation types (completeness / consistency / relevance / currency) plus expert walkthrough, the seven-phase knowledge lifecycle (Create / Validate / Publish / Use / Monitor / Update / Retire), the application to AI-agent systems (skills as frames, routing as rules, memory as graph), and a full GraphRAG section covering the five patterns (entity-anchored retrieval, relationship-aware context, path-based reasoning, subgraph summarization, hybrid vector+graph) with rules for when graph-grounded retrieval beats plain RAG. Do NOT use for the *human-readable* domain analysis layer (use `conceptual-modeling`), for the database / ER design layer (a logical-modeling skill), for pure classification hierarchies (a taxonomy skill), for formal ontology axioms (an ontology skill), or for the live skill-library tooling that consumes modeled knowledge (use `skill-infrastructure`).
6