vision-skill
SKILL.md
Vision Skill
Overview
This skill provides capabilities for visual recognition and image generation using Doubao AI models. It handles image storage via Tencent Cloud COS and executes tasks asynchronously.
Capabilities
1. Vision Recognition
Analyze images to describe content, extract text (OCR), or answer questions about the image.
- Input: Local image path or URL, optional prompt.
- Process: Uploads local images to COS, then calls Doubao Vision API.
- Output: Text description or answer.
2. Image Generation
Generate images from text prompts, optionally using reference images.
- Text-to-Image: Generate images from a text description.
- Image-to-Image: Generate images based on a reference image and text prompt.
- Sequential Generation: Generate a series of consistent images (e.g., storyboards).