videoagent-audio-studio

Installation
Summary

Unified audio generation dispatcher routing TTS, music, sound effects, and voice cloning to optimal models.

  • Routes requests to ElevenLabs (TTS, voice cloning, SFX) or fal.ai (music) based on request type, with latencies ranging from <1s to ~15s
  • Supports five audio capabilities: multilingual text-to-speech with voice selection, low-latency turbo TTS, background music composition, sound effect generation (up to 22 seconds), and voice cloning from audio samples
  • Requires only ELEVENLABS_API_KEY to start; optional FAL_KEY enables music generation via fal.ai
  • Includes self-hosting option via Vercel proxy for regional access or custom domain requirements
SKILL.md

🎙️ VideoAgent Audio Studio

Use when: User asks to generate speech, narrate text, create a voice-over, compose music, or produce a sound effect.

VideoAgent Audio Studio is a smart audio dispatcher. It analyzes your request and routes it to the best available model — ElevenLabs for speech and music, fal.ai for fast SFX — and returns a ready-to-use audio URL.


Quick Reference

Request Type Best Model Latency
Narrate text / Voice-over elevenlabs-tts-v3 ~3s
Low-latency TTS (real-time) elevenlabs-tts-turbo <1s
Background music cassetteai-music ~15s
Sound effect elevenlabs-sfx ~5s
Clone a voice from audio elevenlabs-voice-clone ~10s

Related skills
Installs
3.6K
GitHub Stars
732
First Seen
Mar 6, 2026