serving-llms-vllm

Installation

SKILL.md

vLLM - High-Performance LLM Serving

Quick start

vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).

Installation:

pip install vllm

Basic offline inference:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-3-8B-Instruct")
sampling = SamplingParams(temperature=0.7, max_tokens=256)

Installs

106

Repository

zechenzhangagi/…h-skills

GitHub Stars

10.3K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubFail

SocketPass

SnykWarn

serving-llms-vllm — zechenzhangagi/ai-research-skills