gemini-audio

Installation

SKILL.md

Gemini Audio API Skill

Process audio with transcription, analysis, and understanding, plus generate natural speech using Google's Gemini API. Supports up to 9.5 hours of audio per request with multiple formats.

When to Use This Skill

Use this skill when you need to:

Transcribe audio files to text with timestamps
Summarize audio content and extract key points
Analyze speech, music, or environmental sounds
Generate speech from text with controllable voice and style
Process podcasts, interviews, meetings, or any audio content
Understand non-speech audio (birdsong, sirens, music)

Prerequisites

API Key Setup

The skill supports both Google AI Studio and Vertex AI endpoints.

Related skills

More from aia-11-hn-mib/mib-mockinterviewaibot

gemini-video-understanding
Analyze videos using Google's Gemini API - describe content, answer questions, transcribe audio with visual descriptions, reference timestamps, clip videos, and process YouTube URLs. Supports 9 video formats, multiple models (Gemini 2.5/2.0), and context windows up to 2M tokens (6 hours of video).
25
imagemagick
Guide for using ImageMagick command-line tools to perform advanced image processing tasks including format conversion, resizing, cropping, effects, transformations, and batch operations. Use when manipulating images programmatically via shell commands.
14
remix-icon
Guide for implementing RemixIcon - an open-source neutral-style icon library with 3,100+ icons in outlined and filled styles. Use when adding icons to applications, building UI components, or designing interfaces. Supports webfonts, SVG, React, Vue, and direct integration.
8
obsidian-qa-saver
Save Q&A conversations to Obsidian notes with proper formatting, metadata, and organization. Use this skill when the user explicitly requests to save a conversation, question-answer exchange, or explanation to their Obsidian vault. Automatically formats content as document-style notes with timestamps, tags, and project links.
6
repomix
Package entire code repositories into single AI-friendly files using Repomix. Capabilities include pack codebases with customizable include/exclude patterns, generate multiple output formats (XML, Markdown, plain text), preserve file structure and context, optimize for AI consumption with token counting, filter by file types and directories, add custom headers and summaries. Use when packaging codebases for AI analysis, creating repository snapshots for LLM context, analyzing third-party libraries, preparing for security audits, generating documentation context, or evaluating unfamiliar codebases.
5
gemini-vision
Guide for implementing Google Gemini API image understanding - analyze images with captioning, classification, visual QA, object detection, segmentation, and multi-image comparison. Use when analyzing images, answering visual questions, detecting objects, or processing documents with vision.
5

Installs

Repository

aia-11-hn-mib/m…iewaibot

GitHub Stars

First Seen

Feb 20, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

gemini-audio

Gemini Audio API Skill

When to Use This Skill

Prerequisites

API Key Setup

More from aia-11-hn-mib/mib-mockinterviewaibot

gemini-video-understanding

imagemagick

remix-icon

obsidian-qa-saver

repomix

gemini-vision