hf-mem

Installation
SKILL.md

hf_mem estimates the required memory for inference, including model weights and an optional KV cache, for Safetensors and GGUF for models on the Hugging Face Hub using HTTP Range requests i.e., without downloading or loading any weights locally.

When to use?

  • User asks how much VRAM or memory a model needs to run
  • User wants to know if a model fits on their GPU or a given instance
  • User references a Hugging Face model ID or URL and asks about inference requirements

What are the requirements?

  • uv installed (for uvx)
  • HF_TOKEN env var or --hf-token flag (for gated or private models only)

How to run?

Run with --model-id pointing to the Hugging Face Hub repository which will check that it either contains Safetensors (via model.safetensors, model.safetensors.index.json if sharded, or model_index.json for Diffusers) or GGUF model weights within.

Installs
119
GitHub Stars
10.7K
First Seen
Jun 12, 2026
hf-mem — huggingface/skills