SGLang DeepSeek V3.2 Optimization

Overview

This skill covers the DeepSeek V3.2 support and optimization ladder active in SGLang main. V3.2 shares the DeepSeek V3/R1 model backbone, but it is a separate optimization problem because it activates DeepSeek Sparse Attention, called DSA in docs and NSA in SGLang code.

Current-main snapshot:

SGLang origin/main: 929e00eea on 2026-04-21
sgl-cookbook origin/main: 8ec4d03 on 2026-04-21
V3.2 runtime entry: DeepseekV32ForCausalLM in python/sglang/srt/models/deepseek_v2.py
NSA backend: python/sglang/srt/layers/attention/nsa_backend.py
NSA indexer: python/sglang/srt/layers/attention/nsa/nsa_indexer.py
V3.2 tool parser: python/sglang/srt/function_call/deepseekv32_detector.py

The historical evidence lives in:

references/pr-history.md: chronological PR evidence and code-level notes
references/playbook.md: investigation order, symptom mapping, validation commands

sglang-deepseek-v32-optimization

SGLang DeepSeek V3.2 Optimization

Overview

More from bbuf/sglang-auto-driven-skills

h100

h100-sglang-diffusion

sglang-prod-incident-triage

llm-serving-auto-benchmark

sglang-minimax-m2-series-optimization

sglang-torch-profiler-analysis