gemini-audio

Installation
SKILL.md

Gemini Audio API Skill

Process audio with transcription, analysis, and understanding, plus generate natural speech using Google's Gemini API. Supports up to 9.5 hours of audio per request with multiple formats.

When to Use This Skill

Use this skill when you need to:

  • Transcribe audio files to text with timestamps
  • Summarize audio content and extract key points
  • Analyze speech, music, or environmental sounds
  • Generate speech from text with controllable voice and style
  • Process podcasts, interviews, meetings, or any audio content
  • Understand non-speech audio (birdsong, sirens, music)

Prerequisites

API Key Setup

The skill automatically detects your GEMINI_API_KEY in this order:

Related skills

More from mrgoonie/xxxnaper

Installs
1
GitHub Stars
1
First Seen
Mar 1, 2026