Gemini Live API Development Skill

Use this skill when implementing low-latency Gemini Live experiences such as realtime voice, audio/video streaming, session orchestration, or ephemeral-token based client auth.

Overview

The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.

Key capabilities:

Bidirectional audio streaming — real-time mic-to-speaker conversations
Video streaming — send camera/screen frames alongside audio
Text input/output — send and receive text within a live session
Audio transcriptions — get text transcripts of both input and output audio
Voice Activity Detection (VAD) — automatic interruption handling
Native audio — thinking (with configurable thinkingLevel)
Function calling — synchronous tool use
Google Search grounding — ground responses in real-time search results
Session management — context compression, session resumption, GoAway signals
Ephemeral tokens — secure client-side authentication

gemini-live-api-dev

Gemini Live API Development Skill

Overview