vllm-bench-serve
Installation
SKILL.md
vllm-bench-serve — Online Benchmark Orchestrator
1. Scope & Boundaries
This skill does:
- Execute single or batch
vllm bench serveonline benchmarks against running inference services - Aggregate and compare results across multiple test cases
- Auto-optimize: search for optimal concurrency/throughput given SLO constraints
This skill does NOT do (defer to other skills or decline):
- Start/deploy vLLM services → use
vllm-ascend-server - Offline batch inference throughput tests → use
vllm-ascend - Profiling / tracing (torch profiler, perfetto, NPU profiling) → out of scope
- Health checks only (just checking
/v1/models) → simple curl, no skill needed - Download/clean/convert datasets without running benchmarks → out of scope
- Analyze existing benchmark results without running new tests → out of scope