videoagent-audio-studio
Installation
Summary
Unified audio generation dispatcher routing TTS, music, sound effects, and voice cloning to optimal models.
- Routes requests to ElevenLabs (TTS, voice cloning, SFX) or fal.ai (music) based on request type, with latencies ranging from <1s to ~15s
- Supports five audio capabilities: multilingual text-to-speech with voice selection, low-latency turbo TTS, background music composition, sound effect generation (up to 22 seconds), and voice cloning from audio samples
- Requires only
ELEVENLABS_API_KEYto start; optionalFAL_KEYenables music generation via fal.ai - Includes self-hosting option via Vercel proxy for regional access or custom domain requirements
SKILL.md
🎙️ VideoAgent Audio Studio
Use when: User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect.
VideoAgent Audio Studio is a smart audio dispatcher. It analyzes your request and routes it to the best available model — ElevenLabs for speech and music, fal.ai for fast SFX — and returns a ready-to-use audio URL.
Quick Reference
| Request Type | Best Model | Latency |
|---|---|---|
| Narrate text / Voice-over | elevenlabs-tts-v3 |
~3s |
| Low-latency TTS (real-time) | elevenlabs-tts-turbo |
<1s |
| Background music | cassetteai-music |
~15s |
| Sound effect | elevenlabs-sfx |
~5s |
| Clone a voice from audio | elevenlabs-voice-clone |
~10s |
Related skills