serving-llms-vllm

Originally fromovachiever/droid-tings
Installation
SKILL.md

vLLM - High-Performance LLM Serving

Quick start

vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).

Installation:

pip install vllm

Basic offline inference:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-3-8B-Instruct")
sampling = SamplingParams(temperature=0.7, max_tokens=256)

outputs = llm.generate(["Explain quantum computing"], sampling)
Related skills

More from zechenzhangagi/ai-research-skills

Installs
71
GitHub Stars
8.3K
First Seen
Jan 21, 2026