skills/smithery.ai/blip-2-vision-language

blip-2-vision-language

SKILL.md

BLIP-2: Vision-Language Pre-training

Comprehensive guide to using Salesforce's BLIP-2 for vision-language tasks with frozen image encoders and large language models.

When to use BLIP-2

Use BLIP-2 when:

  • Need high-quality image captioning with natural descriptions
  • Building visual question answering (VQA) systems
  • Require zero-shot image-text understanding without task-specific training
  • Want to leverage LLM reasoning for visual tasks
  • Building multimodal conversational AI
  • Need image-text retrieval or matching
Installs
2
First Seen
Mar 26, 2026