Voice Skill - Local TTS and STT via Voicebox

Scheduling

Goal

Drive the Voicebox local app through its MCP server so any MCP-aware agent can speak (TTS) or listen (STT) without invoking cloud vendors. The skill standardizes intent routing, voice profile resolution, output layout, and guardrails while voicebox itself owns the engines, voice cloning UI, captures archive, and stories editor.

Intent signature

User asks to generate speech, narrate text, produce a voiceover, create an mp3 or wav from text.
User wants an audio file transcribed into text, meeting notes, or a transcript.
User asks for a voice notification when a long task completes or a workflow step is blocked.
Another skill needs local audio generation infrastructure.

When to use

Generating short notification audio for agent task completion or blockers.
Producing voiceover, narration, or audio assets (mp3 or wav) for apps and content.
Transcribing local audio files (mp3, wav, m4a, webm, flac) to Markdown.
Comparing voice profiles by re-running the same text against different profile ids.

oma-voice

Voice Skill - Local TTS and STT via Voicebox

Scheduling

Goal

Intent signature

When to use