llama-cpp
Installation
SKILL.md
llama.cpp - Secondary Inference Engine
Direct access to llama.cpp for faster inference, LoRA adapter loading, and benchmarking on Apple Silicon. Ollama remains primary for RLAMA and general use; llama.cpp is the power tool.
Prerequisites
brew install llama.cpp
Binaries: llama-cli, llama-server, llama-embedding, llama-quantize
Quick Reference
Resolve Ollama Model to GGUF Path
To avoid duplicating model files, resolve an Ollama model name to its GGUF blob path: