llm-inference
Installation
SKILL.md
LLM Inference
High-performance inference engines for serving large language models.
Engine Comparison
| Engine | Best For | Hardware | Throughput | Setup |
|---|---|---|---|---|
| vLLM | Production serving | GPU | Highest | Medium |
| llama.cpp | Local/edge, CPU | CPU/GPU | Good | Easy |
| TGI | HuggingFace models | GPU | High | Easy |
| Ollama | Local desktop | CPU/GPU | Good | Easiest |
| TensorRT-LLM | NVIDIA production | NVIDIA GPU | Highest | Complex |