vllm-deployment
Installation
SKILL.md
vLLM Model Serving and Inference
Quick Start
Docker (CPU)
docker run --rm -p 8000:8000 \
--shm-size=4g \
--cap-add SYS_NICE \
--security-opt seccomp=unconfined \
-e VLLM_CPU_KVCACHE_SPACE=4 \
<vllm-cpu-image> \
--model <model-name> \
--dtype float32
# Access: http://localhost:8000