VLM
VLM(Vision Chat) Skill
This skill guides the implementation of vision chat functionality using the z-ai-web-dev-sdk package, enabling AI models to understand and respond to images combined with text prompts.
Skills Path
Skill Location: {project_path}/skills/VLM
this skill is located at above path in your project.
Reference Scripts: Example test scripts are available in the {Skill Location}/scripts/ directory for quick testing and reference. See {Skill Location}/scripts/vlm.ts for a working example.
Overview
Vision Chat allows you to build applications that can analyze images, extract information from visual content, and answer questions about images through natural language conversation.
IMPORTANT: z-ai-web-dev-sdk MUST be used in backend code only. Never use it in client-side code.
Prerequisites
More from answerzhao/agent-skills
web-reader
Implement web page content extraction capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to scrape web pages, extract article content, retrieve page metadata, or build applications that process web content. Supports automatic content extraction with title, HTML, and publication time retrieval.
1.6Kasr
Implement speech-to-text (ASR/automatic speech recognition) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to transcribe audio files, convert speech to text, build voice input features, or process audio recordings. Supports base64 encoded audio files and returns accurate text transcriptions.
156web-search
Implement web search capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to search the web, retrieve current information, find relevant content, or build applications with real-time web search functionality. Returns structured search results with URLs, snippets, and metadata.
88tts
Implement text-to-speech (TTS) capabilities using the z-ai-web-dev-sdk. Use this skill when the user needs to convert text into natural-sounding speech, create audio content, build voice-enabled applications, or generate spoken audio files. Supports multiple voices, adjustable speed, and various audio formats.
61frontend-design
Transform UI style requirements into production-ready frontend code with systematic design tokens, accessibility compliance, and creative execution. Use when building websites, web applications, React/Vue components, dashboards, landing pages, or any web UI requiring both design consistency and aesthetic quality.
32llm
Implement large language model (LLM) chat completions using the z-ai-web-dev-sdk. Use this skill when the user needs to build conversational AI applications, chatbots, AI assistants, or any text generation features. Supports multi-turn conversations, system prompts, and context management.
30