vllm-ascend-server

Installation

SKILL.md

vLLM-Ascend Server Launcher

Overview

This skill deploys vLLM inference services on Ascend NPU servers with automatic model detection, quantization handling, and performance optimization.

Key Features:

Automatic model discovery and detection
Quantization auto-detection (quant_model_description.json)
Graph mode / Eager mode guidance
Container deployment support
Multi-card tensor parallelism

Workflow Summary

Installs

18

Repository

ascend-ai-codin…d-skills

GitHub Stars

143

First Seen

Apr 18, 2026

Security Audits

Gen Agent Trust HubFail

vllm-ascend-server — ascend-ai-coding/awesome-ascend-skills