pull-llamacpp-model
Pull a llamacpp Model
This machine uses kyuz0/amd-strix-halo-toolboxes:rocm-7.2 for llamacpp inference (AMD Strix Halo / gfx1151 — not supported by the official ROCm build). Harbor's pull mechanism starts an ephemeral container with --n-gpu-layers 0; the custom image fails in that context without ROCm device access. Use the standard CPU image just for pulling, then restore.
Steps
1. Switch to the standard CPU image
harbor config set llamacpp.image.rocm ghcr.io/ggml-org/llama.cpp:server
2. Pull the model
harbor pull <hf-owner/model-repo:quantization>
# Examples:
harbor pull bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
harbor pull unsloth/Mistral-Small-3.1-24B-Instruct-2503-GGUF:UD-Q4_K_XL
More from av/skills
tinygrad
Deep learning framework development with tinygrad - a minimal tensor library with autograd, JIT compilation, and multi-device support. Use when writing neural networks, training models, implementing tensor operations, working with UOps/PatternMatcher for graph transformations, or contributing to tinygrad internals. Triggers on tinygrad imports, Tensor operations, nn modules, optimizer usage, schedule/codegen work, or device backends.
19run-llms
Comprehensive guide for setting up and running local LLMs using Harbor. Use when user wants to run LLMs locally, set up or troubleshoot Ollama, Open WebUI, llama.cpp, vLLM, SearXNG, Open Terminal, or similar local AI services. Covers full setup from Docker prerequisites through running models, per-service configuration, VRAM optimization, GPU troubleshooting, web search integration, code execution, profiles, tunnels, and advanced features. Includes decision trees for autonomous agent workflows and step-by-step troubleshooting playbooks.
16preact-buildless-frontend
Build-less ESM frontends that run directly in the browser without bundlers. Use this skill when creating static frontends, SPAs without build tools, prototypes, or when the user explicitly wants no Vite/Webpack/bundler. Covers import maps, CDN imports, cache-busting, hash routing, and performance patterns.
12turso-db
Install, configure, and work with Turso DB — an in-process SQLite-compatible relational database engine written in Rust. Use when the user needs to (1) install Turso DB, (2) create or query databases with the tursodb CLI shell, (3) use Turso from JavaScript/Node.js via @tursodatabase/database, (4) work with vector search or embeddings in Turso, (5) set up full-text search with FTS indexes, (6) configure transactions including MVCC concurrent transactions, (7) enable encryption at rest, or (8) use Change Data Capture (CDC) for audit logging.
8boost-modules
Create custom modules for [Harbor Boost](https://github.com/av/harbor/tree/main/boost), an optimizing LLM proxy. Use when building Python modules that intercept/transform LLM chat completions—reasoning chains, prompt injection, structured outputs, artifacts, or custom workflows. Triggers on requests to create Boost modules, extend LLM behavior via proxy, or implement chat completion middleware.
8bugbash
Systematically explore and test any software project (CLI, API, Backend, Library, etc.) to find bugs, usability issues, and edge cases. Produces a structured report with full reproduction evidence (exact commands, inputs, logs, and tracebacks) for every issue.
5