AI Multimodal

Process audio, images, videos, documents, and generate images/videos using Google Gemini's multimodal API.

Setup

export GEMINI_API_KEY="your-key"  # Get from https://aistudio.google.com/apikey
pip install google-genai python-dotenv pillow

Quick Start

Verify setup: python scripts/check_setup.py Analyze media: python scripts/gemini_batch_process.py --files <file> --task <analyze|transcribe|extract>

TIP: When you're asked to analyze an image, check if gemini command is available, then use "<prompt to analyze image>" | gemini -y -m gemini-2.5-flash command. If gemini command is not available, use python scripts/gemini_batch_process.py --files <file> --task analyze command. Generate content: python scripts/gemini_batch_process.py --task <generate|generate-video> --prompt "description"

ai-multimodal

AI Multimodal

Setup

Quick Start