vision-language-models

Installation
SKILL.md

Vision Language Models (2026)

Integrate vision capabilities from leading multimodal models for image understanding, document analysis, and visual reasoning.

Overview

  • Image captioning and description generation
  • Visual question answering (VQA)
  • Document/chart/diagram analysis with OCR
  • Multi-image comparison and reasoning
  • Bounding box detection and region analysis
  • Video frame analysis

Model Comparison (January 2026)

Model Context Strengths Vision Input
GPT-5.2 128K Best general reasoning, multimodal Up to 10 images
Claude Opus 4.5 200K Best coding, sustained agent tasks Up to 100 images
Related skills

More from yonatangross/skillforge-claude-plugin

Installs
4
GitHub Stars
170
First Seen
Jan 21, 2026