tts

Installation
Summary

Convert text to natural-sounding speech with single or multi-speaker audio generation.

  • Two modes: Quick mode for instant single-voice MP3 output, and Script mode for multi-speaker dialogue with per-character voice assignment
  • Automatic mode detection based on input structure; supports both plain text and structured scripts with character markers
  • Built-in speaker selection with language support (Chinese and English) and preference saving to local config
  • Configurable output modes: inline playback, file download, or both; all audio saved to .listenhub/tts/ directory with timestamped organization
SKILL.md

When to Use

  • User wants to convert text to spoken audio
  • User asks for "read aloud", "TTS", "text to speech", "voice narration"
  • User says "朗读", "配音", "语音合成"
  • User wants multi-speaker scripted audio or dialogue

When NOT to Use

  • User wants a podcast-style discussion with topic exploration (use /podcast)
  • User wants an explainer video with visuals (use /explainer)
  • User wants to generate an image (use /image-gen)

Purpose

Convert text into natural-sounding speech audio. Two paths:

  1. Quick mode (--mode direct): Single voice, low-latency, sync. For casual chat, reading snippets, instant audio.
  2. Script mode (--mode smart): Multi-speaker, per-segment voice assignment. For dialogue, audiobooks, scripted content.
Related skills
Installs
1.0K
GitHub Stars
53
First Seen
Mar 13, 2026