glmv-grounding
GLMV-Grounding Skill
Extract and visualize grounding results produced by GLM-V. Depending on the user prompt, grounding coordinates in model outputs may appear in different forms, including 2D bounding boxes, Objects Detection JSON, 2D points, 3D bounding boxes, and target-tracking JSON.
Note: GLM-V outputs coordinates where x and y are relative coordinates normalized from pixel coordinates x_pixel and y_pixel using image width W and height H (range 0-1000), i.e., x=round(x_pixel/W1000), y=round(y_pixel/H1000). The origin of the pixel coordinate system is the top-left corner. Note: If the prompt does not explicitly specify a grounding format (for example, "find the location of xxx" or "draw a box around xxx"), treat the request as 2D bounding boxes by default.
When to use
- Use GLM-V to ground targets in images: obtain grounding results in an image for any prompt-described target, with output formats such as 2D bounding box (default), 2D points, and 3D bounding box.
- Use GLM-V to track targets in videos: obtain tracking results in a video for any prompt-described target, with output format like {"0": [{"label": ..., "bbox_2d": ...}, ...], ...}.
- Use utility functions for extraction, conversion, and visualization: extract coordinates, points, and JSON from natural text; normalize and de-normalize coordinates; visualize boxes, points, 3D boxes, and video tracking results.
Setup your API Key
Configure ZHIPU_API_KEY to call the GLM-V API.
- Get your API key: https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys
- Configure it with:
More from zai-org/glm-skills
glmocr-handwriting
Official skill for recognizing handwritten text from images using ZhiPu GLM-OCR API. Supports various
33glmv-pdf-to-ppt
Convert a PDF (research paper, report, or any document) into a polished multi-slide
26glmv-stock-analyst
>
26glm-image-gen
Official skill for generating high-quality images from text prompts using ZhiPu GLM-Image API.
24glm-master-skill
|
24glmv-caption
Generate captions (descriptions) for images, videos, and documents using ZhiPu
22