sglang-sota-performance

Installation
SKILL.md

SGLang SOTA Performance

Overview

Use this skill as the top-level optimization loop for one model at a time. It composes two lower-level skills:

  • llm-serving-auto-benchmark: search and compare best deployment commands across SGLang, vLLM, and TensorRT-LLM.
  • llm-torch-profiler-analysis: capture or analyze torch-profiler traces and produce kernel, overlap-opportunity, and fuse-pattern tables.

This skill's goal is not "run one benchmark." Its goal is a reproducible SGLang improvement loop: tune every framework fairly, prove whether SGLang is behind, explain the gap with profiler evidence, patch SGLang, and re-run the same model workload until the result is SOTA for the target environment.

Treat "SOTA" as "best observed, reproducible performance under the recorded model, workload, hardware, framework commits, precision, and SLA." Do not claim global SOTA without enough external evidence.

Related skills

More from bbuf/sglang-auto-driven-skills

Installs
15
GitHub Stars
272
First Seen
Apr 24, 2026