ai-multimodal

Installation
SKILL.md

AI Multimodal Processing Skill

Process audio, images, videos, documents, and generate images using Google Gemini's multimodal API. Unified interface for all multimedia content understanding and generation.

Core Capabilities

Audio Processing

  • Transcription with timestamps (up to 9.5 hours)
  • Audio summarization and analysis
  • Speech understanding and speaker identification
  • Music and environmental sound analysis
  • Text-to-speech generation with controllable voice
Related skills

More from ggprompts/my-plugins

Installs
2
GitHub Stars
3
First Seen
Mar 14, 2026