h100-sglang-diffusion

Installation

SKILL.md

H100 — SGLang Diffusion

Overview

Use this skill to do SGLang diffusion development on the H100 box through h100_sglang. The default container is sglang_bbuf and the repo lives at /data/bbuf/repos/sglang.

Prefer this skill when:

Validating diffusion Triton / CUDA JIT kernels
Running diffusion model smoke tests (DiffGenerator, flux, etc.)
Comparing eager vs torch.compile diffusion performance
Verifying python[diffusion] editable install changes

This environment is already prepared:

sglang_bbuf is running on lmsysorg/sglang:dev
the repo is cloned at /data/bbuf/repos/sglang
editable installs for python[all] and python[diffusion] are already done
/data/.cache is mounted to /root/.cache

Related skills

More from bbuf/sglang-auto-driven-skills

h100
SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/sgl-workspace/sglang`, and use the ready H100 remote environment for SGLang development and validation. Use when a task needs remote CUDA work, GPU-backed smoke tests, diffusion checks, or a safe remote copy instead of local-only execution.
34
sglang-prod-incident-triage
Replay-first debug flow for SGLang serving problems. Use when a live or recent server shows health-check failures, latency or throughput regressions, queue growth, timeouts, distributed stalls, crash dumps, wrong outputs after deploys, or PD/EP/HiCache issues, and the job is to turn the problem into a replay plus the right next debug tool.
33
llm-serving-auto-benchmark
Framework-independent LLM serving benchmark skill for comparing SGLang, vLLM, TensorRT-LLM, or another serving framework. Use when a user wants to find the best deployment command for one model across multiple serving frameworks under the same workload, GPU budget, and latency SLA.
19
llm-torch-profiler-analysis
Unified LLM torch-profiler triage skill for `sglang`, `vllm`, and `TensorRT-LLM`. Use it to inspect an existing `trace.json(.gz)` or profile directory, or to drive live profiling against a running server and return one three-table report with kernel, overlap-opportunity, and fuse-pattern tables.
18
sglang-sota-performance
End-to-end SGLang SOTA performance workflow. Use when a user names an LLM model and wants SGLang to match or beat the best observed vLLM and TensorRT-LLM serving performance by searching each framework's best deployment command, benchmarking them fairly, profiling SGLang if it is slower, identifying kernel/overlap/fusion bottlenecks, patching SGLang code, and revalidating with real model runs.
15
sglang-minimax-m2-series-optimization
PR-backed and current-main optimization manual for the `MiniMaxAI/MiniMax-M2` series, including M2, M2.1, M2.5, M2.7, and M2.7-highspeed. Use when Codex needs to recover, extend, or audit MiniMax-specific optimizations, TP QK norm/all-reduce behavior, parser contracts, distributed runtime behavior, quantized loading, or backend-specific validation.
15

Installs

Repository

bbuf/sglang-aut…n-skills

GitHub Stars

272

First Seen

Apr 21, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykFail

h100-sglang-diffusion

H100 — SGLang Diffusion

Overview

More from bbuf/sglang-auto-driven-skills

h100

sglang-prod-incident-triage

llm-serving-auto-benchmark

llm-torch-profiler-analysis

sglang-sota-performance

sglang-minimax-m2-series-optimization