gemini-3-multimodal

Installation
SKILL.md

Gemini 3 Pro Multimodal Input Processing

Comprehensive guide for processing multimodal inputs with Gemini 3 Pro, including image understanding, video analysis, audio processing, and PDF document extraction. This skill focuses on INPUT processing (analyzing media) - see gemini-3-image-generation for OUTPUT (generating images).

Overview

Gemini 3 Pro provides native multimodal capabilities for understanding and analyzing various media types. This skill covers all input processing operations with granular control over quality, performance, and token consumption.

Key Capabilities

  • Image Understanding: Object detection, OCR, visual Q&A, code from screenshots
  • Video Processing: Up to 1 hour of video, frame analysis, OCR
  • Audio Processing: Up to 9.5 hours of audio, speech understanding
  • PDF Documents: Native PDF support, multi-page analysis, text extraction
  • Media Resolution Control: Low/medium/high resolution for token optimization
  • Token Optimization: Granular control over processing costs

When to Use This Skill

Installs
58
GitHub Stars
11
First Seen
Jan 24, 2026
gemini-3-multimodal — adaptationio/skrillz