voice-ai-engine-development

Installation
SKILL.md

Voice AI Engine Development

Goal: Build low-latency, conversational Voice AI agents capable of full-duplex communication.

1. The Voice Pipeline (Latency is King)

The total loop Latency (Voice-to-Ear) should be < 1000ms (Ideal < 500ms).

  1. Transport: WebRTC (preferred for browser) or WebSocket (server-server).
  2. VAD (Voice Activity Detection): Detect when user starts/stops speaking.
    • Tools: Silero VAD, WebRTC VAD.
  3. STT (Speech-to-Text): Transcribe audio to text.
    • Tools: Deepgram (fastest), Whisper (high accuracy but slower), AssemblyAI.
  4. LLM (Brain): Process text and generate response.
    • Tools: Groq (Llama 3), GPT-4o, Claude 3.5 Sonnet.
  5. TTS (Text-to-Speech): Convert response to audio.
    • Tools: ElevenLabs (Quality), Cartesia (Speed), OpenAI TTS.

2. Architecture Patterns

Related skills
Installs
4
GitHub Stars
429
First Seen
Feb 10, 2026