gemini-live-api-dev
Installation
SKILL.md
Gemini Live API Development Skill
Overview
The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.
Key capabilities:
- Bidirectional audio streaming — real-time mic-to-speaker conversations
- Video streaming — send camera/screen frames alongside audio
- Text input/output — send and receive text within a live session
- Audio transcriptions — get text transcripts of both input and output audio
- Voice Activity Detection (VAD) — automatic interruption handling
- Native audio — affective dialog, proactive audio, thinking
- Function calling — synchronous and asynchronous tool use
- Google Search grounding — ground responses in real-time search results
- Session management — context compression, session resumption, GoAway signals
- Ephemeral tokens — secure client-side authentication