vision-language-models

Installation
SKILL.md

Vision Language Models ()

Integrate vision capabilities from leading multimodal models for image understanding, document analysis, and visual reasoning.

Overview

  • Image captioning and description generation
  • Visual question answering (VQA)
  • Document/chart/diagram analysis with OCR
  • Multi-image comparison and reasoning
  • Bounding box detection and region analysis
  • Video frame analysis

Model Comparison (January )

Model Context Strengths Vision Input
GPT-5.2 128K Best general reasoning, multimodal Up to 10 images
Claude Opus 4.6 1M Best coding, sustained agent tasks, adaptive thinking Up to 100 images
Related skills

More from yonatangross/orchestkit

Installs
13
GitHub Stars
170
First Seen
Jan 22, 2026