serving-llms-vllm

Installation

SKILL.md

vLLM - High-Performance LLM Serving

Quick start

vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).

Installation:

pip install vllm

Basic offline inference:

from vllm import LLM, SamplingParams

llm = LLM(model="meta-llama/Llama-3-8B-Instruct")
sampling = SamplingParams(temperature=0.7, max_tokens=256)

Installs

445

Repository

orchestra-resea…h-skills

GitHub Stars

10.4K

First Seen

Feb 7, 2026

Security Audits

Gen Agent Trust HubWarn

SocketPass

SnykWarn

serving-llms-vllm — orchestra-research/ai-research-skills