gemini-live-api-dev

Installation
Summary

Real-time bidirectional streaming with Gemini over WebSockets for audio, video, and text conversations.

  • Supports audio input/output (16 kHz PCM), video frames, text, and automatic transcriptions with voice activity detection for interruption handling
  • Includes native audio features: affective dialog, proactive audio, and thinking mode; function calling for synchronous and asynchronous tool use; and Google Search grounding
  • Offers session management with context compression, resumption, and ephemeral tokens for secure client-side authentication
  • Available in Python (google-genai) and JavaScript/TypeScript (@google/genai) SDKs; integrates with LiveKit, Pipecat, Fishjam, and other platforms via WebRTC adapters
SKILL.md

Gemini Live API Development Skill

Overview

The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.

Key capabilities:

  • Bidirectional audio streaming — real-time mic-to-speaker conversations
  • Video streaming — send camera/screen frames alongside audio
  • Text input/output — send and receive text within a live session
  • Audio transcriptions — get text transcripts of both input and output audio
  • Voice Activity Detection (VAD) — automatic interruption handling
  • Native audio — thinking (with configurable thinkingLevel)
  • Function calling — synchronous tool use
  • Google Search grounding — ground responses in real-time search results
  • Session management — context compression, session resumption, GoAway signals
  • Ephemeral tokens — secure client-side authentication

[!NOTE]

Related skills
Installs
3.0K
GitHub Stars
3.5K
First Seen
Mar 3, 2026