huggingface-zerogpu

Installation
SKILL.md

Hugging Face ZeroGPU

Rules and patterns for ML demos on Hugging Face Spaces with ZeroGPU hardware. Covers @spaces.GPU, duration and quota tuning, process isolation, the CUDA availability model, concurrency safety, and CUDA build constraints.

Scope

This skill is for Gradio SDK Spaces using ZeroGPU hardware. Docker and Static Spaces cannot schedule onto ZeroGPU, and Streamlit apps now run as Docker Spaces — so this skill applies only to Gradio. For general Gradio coding (components, layouts, event listeners), see the huggingface-gradio skill in this repo. The authoritative ZeroGPU docs live at https://huggingface.co/docs/hub/spaces-zerogpu — refer to them for the current backing GPU, runtime version lists, and tier thresholds, all of which change over time.

Reference Files

Reference When to read
references/concurrency.md Always read alongside SKILL.md when writing ZeroGPU code — handlers run in parallel by default
references/how-zerogpu-works.md When reasoning about cold-starts, worker reuse, why module-scope warmup does not carry to requests, or why returning CUDA tensors hangs
references/how-quota-works.md When choosing duration values, debugging illegal duration vs quota exceeded errors, or explaining why default 60s blocks short tasks
references/cuda-and-deps.md When installing CUDA-dependent packages (e.g. flash-attn), pinning torch side-cars, or reading wheel filename tags

Hardware

Installs
339
GitHub Stars
10.6K
First Seen
May 20, 2026
huggingface-zerogpu — huggingface/skills