ai-multimodal
SKILL.md
AI Multimodal
Process audio, images, videos, documents, and generate images/videos using Google Gemini's multimodal API.
Setup
export GEMINI_API_KEY="your-key" # Get from https://aistudio.google.com/apikey
pip install google-genai python-dotenv pillow
Quick Start
Verify setup: python scripts/check_setup.py
Analyze media: python scripts/gemini_batch_process.py --files <file> --task <analyze|transcribe|extract>
- TIP: When you're asked to analyze an image, check if
geminicommand is available, then use"<prompt to analyze image>" | gemini -y -m gemini-2.5-flashcommand. Ifgeminicommand is not available, usepython scripts/gemini_batch_process.py --files <file> --task analyzecommand. Generate content:python scripts/gemini_batch_process.py --task <generate|generate-video> --prompt "description"