gemini-audio

Installation

SKILL.md

Gemini Audio API Skill

Process audio with transcription, analysis, and understanding, plus generate natural speech using Google's Gemini API. Supports up to 9.5 hours of audio per request with multiple formats.

When to Use This Skill

Use this skill when you need to:

Transcribe audio files to text with timestamps
Summarize audio content and extract key points
Analyze speech, music, or environmental sounds
Generate speech from text with controllable voice and style
Process podcasts, interviews, meetings, or any audio content
Understand non-speech audio (birdsong, sirens, music)

Prerequisites

API Key Setup

The skill automatically detects your GEMINI_API_KEY in this order:

Installs

2

Repository

mrgoonie/xxxnaper

GitHub Stars

1

First Seen

Mar 1, 2026

Security Audits

Gen Agent Trust HubPass

gemini-audio — mrgoonie/xxxnaper