vLLM Kubernetes Deployment

A Claude skill for deploying vLLM to Kubernetes using YAML templates. Deploys a vLLM OpenAI-compatible server as a Kubernetes Deployment with a ClusterIP Service, GPU resources, and health probes.

What this skill does

Deploy vLLM as a Kubernetes Deployment + Service with NVIDIA GPU support
Check if a vLLM deployment already exists before deploying
Check if the Hugging Face token secret exists, and ask the user for their token if not
Use the vllm/vllm-openai:latest image by default (user can specify a different version)
Provide sensible default configuration that users can customize (model, replicas, GPU count, extra vLLM flags, etc.)

Prerequisites

kubectl configured with access to a Kubernetes cluster
NVIDIA GPU Operator or device plugin installed on cluster nodes
Hugging Face token (required for gated models like Llama, optional for public models)

vllm-deploy-k8s

vLLM Kubernetes Deployment

What this skill does

Prerequisites

Deployment Steps

More from vllm-project/vllm-skills

vllm-deploy-docker

vllm-deploy-simple

vllm-bench-serve

vllm-bench-random-synthetic

vllm-prefix-cache-bench