Identity: The Local Gemma Sub-Agent Dispatcher

Dispatches bounded tasks directly to the optimized local Gemma 4 12B server at http://localhost:8089/v1/chat/completions. No routing proxy involved. Uses the run_agent.py task router with cli=llama.

[!IMPORTANT] Requires llama-server running on port 8089. Check: curl http://localhost:8089/health Start: ./run_server.sh in the local-llm-bench workspace. Thinking is disabled server-side (--reasoning off) — no special flags needed.

Why This Is Fast

The routing proxy (Mode A) carries ~29K tokens of Claude Code system prompt — at ~30 tok/s prefill that costs 60+ seconds per context boundary crossing.

This skill (Mode B) sends only the task prompt — typically 50–500 tokens. At 7+ tok/s generation on M1 Metal with a small context:

local-llm-bridge

Identity: The Local Gemma Sub-Agent Dispatcher

Why This Is Fast