ai-multimodal

Installation

SKILL.md

AI Multimodal Processing Skill

Process audio, images, videos, documents, and generate images using Google Gemini's multimodal API. Unified interface for all multimedia content understanding and generation.

Core Capabilities

Audio Processing

Transcription with timestamps (up to 9.5 hours)
Audio summarization and analysis
Speech understanding and speaker identification
Music and environmental sound analysis
Text-to-speech generation with controllable voice

Installs

Repository

zircote/agents

GitHub Stars

First Seen

5 days ago

ai-multimodal — zircote/agents