vllm-ascend-server
Installation
SKILL.md
vLLM-Ascend Server Launcher
Overview
This skill deploys vLLM inference services on Ascend NPU servers with automatic model detection, quantization handling, and performance optimization.
Key Features:
- Automatic model discovery and detection
- Quantization auto-detection (
quant_model_description.json) - Graph mode / Eager mode guidance
- Container deployment support
- Multi-card tensor parallelism