llama-cpp
Installation
SKILL.md
llama.cpp + GGUF
Use this skill for local GGUF inference, quant selection, or Hugging Face repo discovery for llama.cpp.
When to use
- Run local models on CPU, Apple Silicon, CUDA, ROCm, or Intel GPUs
- Find the right GGUF for a specific Hugging Face repo
- Build a
llama-serverorllama-clicommand from the Hub - Search the Hub for models that already support llama.cpp
- Enumerate available
.gguffiles and sizes for a repo - Decide between Q4/Q5/Q6/IQ variants for the user's RAM or VRAM
Model Discovery workflow
Prefer URL workflows before asking for hf, Python, or custom scripts.
Related skills
More from nousresearch/hermes-agent
dogfood
Exploratory QA of web apps: find bugs, evidence, reports.
2.4Kyuanbao
Yuanbao (元宝) groups: @mention users, query info/members.
165llm-wiki
Karpathy's LLM Wiki: build/query interlinked markdown KB.
20manim-video
Manim CE animations: 3Blue1Brown math/algo videos.
15powerpoint
Create, read, edit .pptx decks, slides, notes, templates.
14ocr-and-documents
Extract text from PDFs/scans (pymupdf, marker-pdf).
14