glmv-caption

Installation
SKILL.md

GLM-V Caption Skill

Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.

When to Use

  • Describe, caption, summarize, or interpret image/video/document content
  • User mentions "describe this image", "caption", "summarize this video", "图片描述", "视频摘要", "文档解读", "看图说话"
  • Extract visual or textual information from media files
  • Compare multiple images
  • User provides an image/video/file and asks what's in it

Supported Input Types

Type Formats Max Size Max Count Base64
Image jpg, png, jpeg 5MB / 6000×6000px 50
Video mp4, mkv, mov 200MB
File pdf, docx, txt, xlsx, pptx, jsonl 50
Related skills
Installs
21
GitHub Stars
381
First Seen
Apr 2, 2026