vllm-bench-serve

Installation
SKILL.md

vLLM Bench Serve

Benchmark vLLM or any OpenAI-compatible serving endpoint using the vllm bench serve CLI. Measures throughput, latency (TTFT, TPOT), and goodput against configurable request load.

Reference: vLLM Bench Serve Documentation

Prerequisites

  • vLLM installed (or any OpenAI-compatible server running)
  • A vLLM server or API endpoint already serving a model
  • Python environment with vLLM for the benchmark client

Quick Start

Basic benchmark against local vLLM server (default random dataset, 1000 prompts):

vllm bench serve \
  --backend openai-chat \
Related skills
Installs
39
GitHub Stars
68
First Seen
Apr 14, 2026