SGLang SOTA Performance

Overview

Use this skill as the top-level optimization loop for one model at a time. It composes two lower-level skills:

llm-serving-auto-benchmark: search and compare best deployment commands across SGLang, vLLM, and TensorRT-LLM.
llm-torch-profiler-analysis: capture or analyze torch-profiler traces and produce kernel, overlap-opportunity, and fuse-pattern tables.

This skill's goal is not "run one benchmark." Its goal is a reproducible SGLang improvement loop: tune every framework fairly, prove whether SGLang is behind, explain the gap with profiler evidence, patch SGLang, and re-run the same model workload until the result is SOTA for the target environment.

Treat "SOTA" as "best observed, reproducible performance under the recorded model, workload, hardware, framework commits, precision, and SLA." Do not claim global SOTA without enough external evidence.

sglang-sota-performance

SGLang SOTA Performance

Overview

More from bbuf/sglang-auto-driven-skills

h100

h100-sglang-diffusion

sglang-prod-incident-triage

llm-serving-auto-benchmark

llm-torch-profiler-analysis

sglang-minimax-m2-series-optimization