vLLM Docker Deployment

A Claude skill describing how to deploy vLLM with Docker using the official pre-built images or building the image from source supporting NVIDIA GPUs with CUDA. Instructions include NVIDIA CUDA support, example docker run and a minimal docker-compose snippet, recommended flags, and troubleshooting notes. For AMD, Intel, or other accelerators, please refer to the vLLM documentation for alternative deployment methods.

What this skill does

Deploy vLLM with docker using pre-built images (recommended for most users) or build from source for custom configurations
Provide example commands for running the OpenAI-compatible server with GPU access and mounted Hugging Face cache
Point to build-from-source instructions when a custom image or optional dependencies are needed
Explain common flags: --ipc=host, shared cache mounts, and HF_TOKEN handling

Prerequisites

Docker Engine installed (Docker 20.10+ recommended)
NVIDIA GPU(s) with appropriate drivers and CUDA toolkit installed
Optional: curl for API tests
A Hugging Face token if pulling private models or to avoid rate-limits: HF_TOKEN

vllm-deploy-docker

vLLM Docker Deployment

What this skill does

Prerequisites

Quickstart using Pre-built Image (recommended)

More from vllm-project/vllm-skills

vllm-deploy-simple

vllm-deploy-k8s

vllm-bench-serve

vllm-bench-random-synthetic

vllm-prefix-cache-bench