gemini-document-processing
Installation
SKILL.md
Gemini Document Processing
Process and analyze PDF documents using Google Gemini's native vision capabilities. Extract structured information, summarize content, answer questions, and understand complex documents with text, images, diagrams, charts, and tables.
Core Capabilities
- PDF Vision Processing: Native understanding of PDFs up to 1,000 pages (258 tokens/page)
- Multimodal Analysis: Process text, images, diagrams, charts, and tables
- Structured Extraction: Output to JSON with schema validation
- Document Q&A: Answer questions based on document content
- Summarization: Generate summaries preserving context
- Format Conversion: Transcribe to HTML while preserving layout