vision-language-models
Installation
SKILL.md
Vision Language Models (2026)
Integrate vision capabilities from leading multimodal models for image understanding, document analysis, and visual reasoning.
Overview
- Image captioning and description generation
- Visual question answering (VQA)
- Document/chart/diagram analysis with OCR
- Multi-image comparison and reasoning
- Bounding box detection and region analysis
- Video frame analysis