StepFun stepaudio-2.5-asr

Transcribe audio with StepFun's stepaudio-2.5-asr (released 2026-04, verified 2026-04-23). Long audio in one call, no chunking — but only if the request hits the right endpoint with the right body shape. The wrong endpoint returns an error that looks identical to "model doesn't exist", which is the #1 reason this skill exists.

Companion: for TTS with stepaudio-2.5-tts (the sibling model), use the stepfun-tts skill — they share an API key but live on different endpoints with different body shapes.

Why this skill exists — three traps that cost hours

Wrong endpoint, wrong error. stepaudio-2.5-asr does not live on /v1/audio/transcriptions (that endpoint serves the older step-asr family). It lives on /v1/audio/asr/sse — SSE streaming, JSON body, base64 audio. Sending it to the wrong endpoint returns {"error":{"message":"model stepaudio-2.5-asr not supported"}}, which is identical in structure to a genuinely nonexistent model name. People waste hours filing whitelist tickets.
Plan key vs Normal key, silent failure. StepFun's "Plan" subscription keys (cheap, text-only) cannot call audio endpoints, but the failure manifests as a 4xx with no auth-shaped error message. If your account has a Plan subscription, you need a separate "Normal" key from the same console.
SSE error events are real. Censorship can fire on the ASR side too (rarely). Don't assume only transcript.text.delta and transcript.text.done events arrive — handle type: error events in the stream or you'll silently drop them.

Config and auth

API key resolves in this order (fail-fast, no defaults):

stepfun-asr

StepFun stepaudio-2.5-asr

Why this skill exists — three traps that cost hours

Config and auth