llama-cpp

Installation
SKILL.md

llama.cpp + GGUF

Use this skill for local GGUF inference, quant selection, or Hugging Face repo discovery for llama.cpp.

When to use

  • Run local models on CPU, Apple Silicon, CUDA, ROCm, or Intel GPUs
  • Find the right GGUF for a specific Hugging Face repo
  • Build a llama-server or llama-cli command from the Hub
  • Search the Hub for models that already support llama.cpp
  • Enumerate available .gguf files and sizes for a repo
  • Decide between Q4/Q5/Q6/IQ variants for the user's RAM or VRAM

Model Discovery workflow

Prefer URL workflows before asking for hf, Python, or custom scripts.

Related skills
Installs
6
GitHub Stars
149.0K
First Seen
Apr 4, 2026