gemini-audio

Installation
SKILL.md

Gemini Audio API Skill

Process audio with transcription, analysis, and understanding, plus generate natural speech using Google's Gemini API. Supports up to 9.5 hours of audio per request with multiple formats.

When to Use This Skill

Use this skill when you need to:

  • Transcribe audio files to text with timestamps
  • Summarize audio content and extract key points
  • Analyze speech, music, or environmental sounds
  • Generate speech from text with controllable voice and style
  • Process podcasts, interviews, meetings, or any audio content
  • Understand non-speech audio (birdsong, sirens, music)

Prerequisites

API Key Setup

The skill supports both Google AI Studio and Vertex AI endpoints.

Related skills

More from aia-11-hn-mib/mib-mockinterviewaibot

Installs
4
GitHub Stars
1
First Seen
Feb 20, 2026