llava
Installation
SKILL.md
LLaVA - Large Language and Vision Assistant
Open-source vision-language model for conversational image understanding.
When to use LLaVA
Use when:
- Building vision-language chatbots
- Visual question answering (VQA)
- Image description and captioning
- Multi-turn image conversations
- Visual instruction following
- Document understanding with images
Metrics:
- 23,000+ GitHub stars
- GPT-4V level capabilities (targeted)
- Apache 2.0 License
- Multiple model sizes (7B-34B params)
Related skills
More from nousresearch/hermes-agent
dogfood
Exploratory QA of web apps: find bugs, evidence, reports.
2.5Kyuanbao
Yuanbao (元宝) groups: @mention users, query info/members.
166llm-wiki
Karpathy's LLM Wiki: build/query interlinked markdown KB.
20manim-video
Manim CE animations: 3Blue1Brown math/algo videos.
15powerpoint
Create, read, edit .pptx decks, slides, notes, templates.
14ocr-and-documents
Extract text from PDFs/scans (pymupdf, marker-pdf).
14