vllm-deploy-docker

Installation
SKILL.md

vLLM Docker Deployment

A Claude skill describing how to deploy vLLM with Docker using the official pre-built images or building the image from source supporting NVIDIA GPUs with CUDA. Instructions include NVIDIA CUDA support, example docker run and a minimal docker-compose snippet, recommended flags, and troubleshooting notes. For AMD, Intel, or other accelerators, please refer to the vLLM documentation for alternative deployment methods.

What this skill does

  • Deploy vLLM with docker using pre-built images (recommended for most users) or build from source for custom configurations
  • Provide example commands for running the OpenAI-compatible server with GPU access and mounted Hugging Face cache
  • Point to build-from-source instructions when a custom image or optional dependencies are needed
  • Explain common flags: --ipc=host, shared cache mounts, and HF_TOKEN handling

Prerequisites

  • Docker Engine installed (Docker 20.10+ recommended)
  • NVIDIA GPU(s) with appropriate drivers and CUDA toolkit installed
  • Optional: curl for API tests
  • A Hugging Face token if pulling private models or to avoid rate-limits: HF_TOKEN

Quickstart using Pre-built Image (recommended)

Related skills
Installs
70
GitHub Stars
68
First Seen
Feb 25, 2026