gemini-audio
Installation
SKILL.md
Gemini Audio API Skill
Process audio with transcription, analysis, and understanding, plus generate natural speech using Google's Gemini API. Supports up to 9.5 hours of audio per request with multiple formats.
When to Use This Skill
Use this skill when you need to:
- Transcribe audio files to text with timestamps
- Summarize audio content and extract key points
- Analyze speech, music, or environmental sounds
- Generate speech from text with controllable voice and style
- Process podcasts, interviews, meetings, or any audio content
- Understand non-speech audio (birdsong, sirens, music)
Prerequisites
API Key Setup
The skill automatically detects your GEMINI_API_KEY in this order: