blip-2-vision-language

Originally fromzechenzhangagi/ai-research-skills

Installation

SKILL.md

BLIP-2: Vision-Language Pre-training

Comprehensive guide to using Salesforce's BLIP-2 for vision-language tasks with frozen image encoders and large language models.

When to use BLIP-2

Use BLIP-2 when:

Need high-quality image captioning with natural descriptions
Building visual question answering (VQA) systems
Require zero-shot image-text understanding without task-specific training
Want to leverage LLM reasoning for visual tasks
Building multimodal conversational AI
Need image-text retrieval or matching

Installs

354

Repository

orchestra-resea…h-skills

GitHub Stars

10.4K

First Seen

Feb 7, 2026

Security Audits

Gen Agent Trust HubPass

blip-2-vision-language — orchestra-research/ai-research-skills