Pull a llamacpp Model

This machine uses kyuz0/amd-strix-halo-toolboxes:rocm-7.2 for llamacpp inference (AMD Strix Halo / gfx1151 — not supported by the official ROCm build). Harbor's pull mechanism starts an ephemeral container with --n-gpu-layers 0; the custom image fails in that context without ROCm device access. Use the standard CPU image just for pulling, then restore.

Steps

1. Switch to the standard CPU image

harbor config set llamacpp.image.rocm ghcr.io/ggml-org/llama.cpp:server

2. Pull the model

harbor pull <hf-owner/model-repo:quantization>
# Examples:
harbor pull bartowski/Qwen2.5-7B-Instruct-GGUF:Q4_K_M
harbor pull unsloth/Mistral-Small-3.1-24B-Instruct-2503-GGUF:UD-Q4_K_XL

pull-llamacpp-model

Pull a llamacpp Model

Steps

1. Switch to the standard CPU image

2. Pull the model

More from av/skills

tinygrad

run-llms

preact-buildless-frontend

turso-db

boost-modules

bugbash