llama-cpp

Installation

SKILL.md

llama.cpp + GGUF

Use this skill for local GGUF inference, quant selection, or Hugging Face repo discovery for llama.cpp.

When to use

Run local models on CPU, Apple Silicon, CUDA, ROCm, or Intel GPUs
Find the right GGUF for a specific Hugging Face repo
Build a llama-server or llama-cli command from the Hub
Search the Hub for models that already support llama.cpp
Enumerate available .gguf files and sizes for a repo
Decide between Q4/Q5/Q6/IQ variants for the user's RAM or VRAM

Model Discovery workflow

Prefer URL workflows before asking for hf, Python, or custom scripts.

Related skills

More from nousresearch/hermes-agent

Installs

6

Repository

nousresearch/he…es-agent

GitHub Stars

149.0K

First Seen

Apr 4, 2026

Security Audits

Gen Agent Trust HubPass