vllm-deploy-k8s

Installation
SKILL.md

vLLM Kubernetes Deployment

A Claude skill for deploying vLLM to Kubernetes using YAML templates. Deploys a vLLM OpenAI-compatible server as a Kubernetes Deployment with a ClusterIP Service, GPU resources, and health probes.

What this skill does

  • Deploy vLLM as a Kubernetes Deployment + Service with NVIDIA GPU support
  • Check if a vLLM deployment already exists before deploying
  • Check if the Hugging Face token secret exists, and ask the user for their token if not
  • Use the vllm/vllm-openai:latest image by default (user can specify a different version)
  • Provide sensible default configuration that users can customize (model, replicas, GPU count, extra vLLM flags, etc.)

Prerequisites

  • kubectl configured with access to a Kubernetes cluster
  • NVIDIA GPU Operator or device plugin installed on cluster nodes
  • Hugging Face token (required for gated models like Llama, optional for public models)

Deployment Steps

Related skills
Installs
40
GitHub Stars
68
First Seen
Apr 14, 2026