model-deployment
Installation
SKILL.md
Model Deployment
Deploy LLMs to production with optimal performance.
Quick Start
vLLM Server
# Install
pip install vllm
# Start server
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-2-7b-chat-hf \
--port 8000 \
--tensor-parallel-size 1