glmv-caption
Installation
SKILL.md
GLM-V Caption Skill
Generate captions for images, videos, and documents using the ZhiPu GLM-V multimodal model.
When to Use
- Describe, caption, summarize, or interpret image/video/document content
- User mentions "describe this image", "caption", "summarize this video", "图片描述", "视频摘要", "文档解读", "看图说话"
- Extract visual or textual information from media files
- Compare multiple images
- User provides an image/video/file and asks what's in it
Supported Input Types
| Type | Formats | Max Size | Max Count | Base64 |
|---|---|---|---|---|
| Image | jpg, png, jpeg | 5MB / 6000×6000px | 50 | ✅ |
| Video | mp4, mkv, mov | 200MB | — | ❌ |
| File | pdf, docx, txt, xlsx, pptx, jsonl | — | 50 | ❌ |
Related skills
More from zai-org/glm-skills
glmocr-handwriting
Official skill for recognizing handwritten text from images using ZhiPu GLM-OCR API. Supports various
31glmv-pdf-to-ppt
Convert a PDF (research paper, report, or any document) into a polished multi-slide
25glmv-stock-analyst
>
25glm-master-skill
|
23glm-image-gen
Official skill for generating high-quality images from text prompts using ZhiPu GLM-Image API.
22glmv-grounding
>
22