ai-multimodal

Originally fromsamhvw8/dot-claude
Installation
SKILL.md

AI Multimodal Processing Skill

Process audio, images, videos, documents, and generate images using Google Gemini's multimodal API. Unified interface for all multimedia content understanding and generation.

Core Capabilities

Audio Processing

  • Transcription with timestamps (up to 9.5 hours)
  • Audio summarization and analysis
  • Speech understanding and speaker identification
  • Music and environmental sound analysis
  • Text-to-speech generation with controllable voice
Installs
455
GitHub Stars
2.2K
First Seen
Jan 22, 2026
ai-multimodal — mrgoonie/claudekit-skills