vllm-ascend-server

Installation
SKILL.md

vLLM-Ascend Server Launcher

Overview

This skill deploys vLLM inference services on Ascend NPU servers with automatic model detection, quantization handling, and performance optimization.

Key Features:

  • Automatic model discovery and detection
  • Quantization auto-detection (quant_model_description.json)
  • Graph mode / Eager mode guidance
  • Container deployment support
  • Multi-card tensor parallelism

Workflow Summary

Installs
9
GitHub Stars
93
First Seen
Apr 18, 2026
vllm-ascend-server — ascend-ai-coding/awesome-ascend-skills