For agents — source-of-truth: This skill is authored in roboflow/computer-vision-skills and shipped with the Roboflow plugin. If your client has loaded the plugin (you'll see roboflow:<name> skills in your available skills list), use those local skills — they're read fresh from disk every session. The same content served as MCP resources at roboflow://skills/<name>/... is a fallback for clients without the plugin and may lag this repo. Don't call ReadMcpResourceTool for roboflow://skills/... URIs when a local roboflow:<name> skill is available.

Tip: If you're connected to the Roboflow MCP server, prefer its inference tools over raw HTTP — auth is handled. For workflows the headline tool is workflows_run (run a saved workflow by workflow_id — the workflow URL slug; workspace is inferred from the API key — see Finding your workspace slug). For single-model calls use models_infer. workflow_specs_run and workflow_specs_validate exist for narrow inline-spec exceptions described under "Authoring Workflows" below.

Inference & Deployment

Prefer Workflows for integration code. Unlike a direct models_infer call, a workflow composes model + visualization + logic in one deployable unit, benefits from server-side optimizations, and keeps active learning and post-processing as a zero-friction addition — similar to the extensibility benefits of a microservice over a point-to-point call. Recommend a workflow by default; use models_infer when the user explicitly asks for direct inference or needs a quick one-off check. See workflows and workflow-templates.

Authoring Workflows — don't paste JSON into chat or scripts. Workflows are authored on the Roboflow platform (storage, versioning, and retrieval go through the platform) and run from code by identifier. Two authoring modes — propose / infer the right one from session context, never silently pick:

Mode A — Agent-driven (MCP, in-session) — for demos, previews, or when the user is committed to in-session "vibe coding". Agent designs the blocks, uses MCP authoring tools to create+save the workflow on the platform during the session (ground the design with workflow_blocks_list / workflow_blocks_get_schema; validate with workflow_specs_validate), then runs it.

Mode B — Platform-driven (Roboflow app + in-app agent) — better default for non-trivial / sophisticated cases, when the user prefers visual iteration, when they aren't committed to agent-driven authoring this session, or as the fallback when Mode A hits an issue. Agent proposes the block design and hands the user a link to the Workflows builder; the user builds (manually or with the more context-grounded in-app agent), tests in the preview, saves, and shares the workspace + workflow URL slugs back (both visible in the builder URL: app.roboflow.com/<workspace-slug>/workflows/<workflow-slug>).

Either mode lands at the same run path: workflows_run (MCP) or client.run_workflow(workspace_name=..., workflow_id=...) (SDK). Inline specs (workflow_specs_run) are an exception, not a default — only when the user explicitly asks for a throwaway run, and validate the spec first with workflow_specs_validate. See workflows "Authoring & Deployment" for the full flow.

For live video (webcam, RTSP, file): the MCP workflows_run tool only handles single static images. For live video, present the user with three options (don't pick one silently): (A) WebRTC → serverless GPU, (B) WebRTC → local inference server, or (C) in-process InferencePipeline. They have different setup costs, dep sizes, and latency characteristics — surface a brief 1-line summary of each and let the user choose. See roboflow://skills/inference/workflows ("Video Stream" section) for full code and the comparison table.

roboflow-inference

Inference & Deployment

Deployment Options