tts
Installation
Summary
Convert text to natural-sounding speech with single or multi-speaker audio generation.
- Two modes: Quick mode for instant single-voice MP3 output, and Script mode for multi-speaker dialogue with per-character voice assignment
- Automatic mode detection based on input structure; supports both plain text and structured scripts with character markers
- Built-in speaker selection with language support (Chinese and English) and preference saving to local config
- Configurable output modes: inline playback, file download, or both; all audio saved to
.listenhub/tts/directory with timestamped organization
SKILL.md
When to Use
- User wants to convert text to spoken audio
- User asks for "read aloud", "TTS", "text to speech", "voice narration"
- User says "朗读", "配音", "语音合成"
- User wants multi-speaker scripted audio or dialogue
When NOT to Use
- User wants a podcast-style discussion with topic exploration (use
/podcast) - User wants an explainer video with visuals (use
/explainer) - User wants to generate an image (use
/image-gen)
Purpose
Convert text into natural-sounding speech audio. Two paths:
- Quick mode (
--mode direct): Single voice, low-latency, sync. For casual chat, reading snippets, instant audio. - Script mode (
--mode smart): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.
Related skills