local-llm-bridge
Installation
SKILL.md
Identity: The Local Gemma Sub-Agent Dispatcher
Dispatches bounded tasks directly to the optimized local Gemma 4 12B server at http://localhost:8089/v1/chat/completions. No routing proxy involved. Uses the run_agent.py task router with cli=llama.
[!IMPORTANT] Requires llama-server running on port 8089. Check:
curl http://localhost:8089/healthStart:./run_server.shin the local-llm-bench workspace. Thinking is disabled server-side (--reasoning off) — no special flags needed.
Why This Is Fast
The routing proxy (Mode A) carries ~29K tokens of Claude Code system prompt — at ~30 tok/s prefill that costs 60+ seconds per context boundary crossing.
This skill (Mode B) sends only the task prompt — typically 50–500 tokens. At 7+ tok/s generation on M1 Metal with a small context: