tool-call-flow
Tool-Call Flow
Coverage
The protocol-level cycle by which a language model uses external capabilities. Covers the four phases (declaration, request, execution, continuation), the message-history state model that ties them together, the structural differences between vendor protocols (Anthropic tool-use, OpenAI function-calling, Model Context Protocol, Gemini function calling), parallel tool calls, streaming tool calls, the runtime's role as orchestrator, error encoding inside the cycle, and the boundary between model-side intent and runtime-side execution.
Philosophy
A tool-call flow is the smallest unit of agentic capability. Strip away orchestration patterns, multi-agent coordination, evaluation harnesses — what remains is a single language model alternating turns with a runtime that executes capabilities on its behalf. Understanding this cycle precisely is the foundation for understanding everything that builds on it.
The cycle's defining property is the separation of planning from execution. The model produces structured intent; the runtime carries it out. This separation is not a workaround for current model capabilities; it is a deliberate design choice that makes the system auditable, composable, and recoverable. A system that fuses the two — by letting the model execute code directly, or by letting the runtime make decisions — gains expressiveness and loses every benefit the separation provides.
The four-phase structure is identical across every current vendor protocol. The names differ, the message shapes differ, the encoding of parallelism differs, but the cycle — declare, request, execute, continue — is the same. A practitioner who understands the cycle can move between Anthropic, OpenAI, MCP, and Gemini at the cost of a translation layer; a practitioner who understands only one vendor's encoding cannot.
The Four Phases
More from jacob-balslev/skills
layout-composition
Use when deciding responsive page or screen structure: section order, scan pattern, grid/flex composition, breakpoints, viewport hierarchy, responsive media, and density. Do NOT use for user-goal decomposition (use `task-analysis`), navigation taxonomy (use `information-architecture`), visual polish (use `visual-design-foundations`), or component/token contracts (use `design-system-architecture`).
8context-graph
Use when designing or auditing the multi-graph context architecture of an AI-coding workspace: skill graph, document routing graph, memory index, script registry, and the cross-graph edges between them. Covers edge typing, orphan detection, connectivity health, deterministic graph synthesis signals, change-propagation checks, and drift or hub-and-spoke anti-patterns. Do NOT use for authoring one SKILL.md (use `skill-scaffold`), validating one skill (use `graph-audit`), live routing decisions (use `skill-router`), context-window budgeting (use `context-window`), or session load/drop choices (use `context-management`).
8visual-design-foundations
Use when designing or auditing visual craft: color palette, typography, spacing, elevation, rhythm, density, visual hierarchy, brand fit, contrast intent, and motion feel. Do NOT use for sign-system meaning (use `semiotics`), token/component architecture (use `design-system-architecture`), responsive structure (use `layout-composition`), or accessibility compliance (use `a11y`).
7project-knowledge-extraction
Use when extracting durable project knowledge from code, docs, issues, incidents, reports, screenshots, or conversations into reusable context such as skills, ADRs, glossaries, context docs, or memory. Do NOT use for writing a new skill contract (use `skill-scaffold`), maintaining library tooling (use `skill-infrastructure`), or generic documentation polish (use `documentation`).
6problem-framing
Use when a team is converging on solutions before agreeing on the problem, when a brief reads as a feature request, when symptoms and root needs are tangled, or when assumptions need surfacing before design work proceeds. Do NOT use for code-level bug triage, runtime failure diagnosis, or root-cause analysis of system errors — those are engineering investigation tasks, not design problem framing.
6ai-native-development
Use when reasoning about agent autonomy levels, designing auto-improve loops, evaluating AI-generated code quality, or measuring agent productivity in an LLM-assisted codebase. Covers Karpathy's three eras of software (1.0 explicit / 2.0 learned / 3.0 natural-language), the vibe-coding-vs-agentic-engineering distinction, the 0–5 autonomy slider with task-type recommendations, the one-asset / one-metric / one-time-box AutoResearch loop, Software 3.0 productivity metrics, and the documented quality regressions of ungated AI-generated code (the 'vibe hangover'). Do NOT use for choosing a specific autonomy-loop topology (use `agent-engineering`), for the per-prompt authoring discipline (use `prompt-craft`), or for reviewing the AI-generated code that comes out of a Software 3.0 workflow (use `code-review`).
6