stepfun-tts

Installation
SKILL.md

StepFun stepaudio-2.5-tts

Generate Chinese / Japanese speech with stepaudio-2.5-tts (released 2026-04, verified 2026-04-23). Contextual TTS — emotion and prosody go through natural-language description, not fixed labels.

Companion: for transcription with stepaudio-2.5-asr (the sibling model), use the stepfun-asr skill — they share an API key but live on different endpoints with different body shapes.

Why this skill exists — StepAudio 2.5 has two non-obvious pitfalls that cost hours if you don't know them:

  1. stepaudio-2.5-tts rejects voice_label (the step-tts-2 way). Emotion/prosody now goes through instruction (natural-language description, ≤200 chars) and inline () parentheses inside the text itself.
  2. Censorship is stricter — anything containing 死 / 消失 / sensitive political terms returns censorship_block. Your rewrite options are in references/migration_from_v2.md.

Config and auth

API key lives in $STEPFUN_API_KEY (preferred) or ${CLAUDE_PLUGIN_DATA}/config.json (fallback for cross-session persistence). All bundled scripts try env first, then config.

First-time setup (one-liner):

Related skills

More from daymade/claude-code-skills

Installs
108
GitHub Stars
1.1K
First Seen
Apr 27, 2026