voice-ai-engine-development
Installation
SKILL.md
Voice AI Engine Development
Goal: Build low-latency, conversational Voice AI agents capable of full-duplex communication.
1. The Voice Pipeline (Latency is King)
The total loop Latency (Voice-to-Ear) should be < 1000ms (Ideal < 500ms).
- Transport: WebRTC (preferred for browser) or WebSocket (server-server).
- VAD (Voice Activity Detection): Detect when user starts/stops speaking.
- Tools: Silero VAD, WebRTC VAD.
- STT (Speech-to-Text): Transcribe audio to text.
- Tools: Deepgram (fastest), Whisper (high accuracy but slower), AssemblyAI.
- LLM (Brain): Process text and generate response.
- Tools: Groq (Llama 3), GPT-4o, Claude 3.5 Sonnet.
- TTS (Text-to-Speech): Convert response to audio.
- Tools: ElevenLabs (Quality), Cartesia (Speed), OpenAI TTS.
2. Architecture Patterns
Related skills
More from dokhacgiakhoa/antigravity-ide
ui-ux-pro-max-skill
Premium design and micro-interactions toolkit.
89notion-mcp
Official Notion Model Context Protocol Server for workspace interaction.
33filesystem-mcp
Official Filesystem Model Context Protocol Server for local file operations.
24puppeteer-mcp
Official Puppeteer Model Context Protocol Server for browser automation.
15penetration-tester-master
Ultimate Offensive Security Master Skill.
14postgres-mcp
Official PostgreSQL Model Context Protocol Server for database interaction.
14