bailian-multimodal-skills
Installation
SKILL.md
Bailian Multimodal Skills
Generate images, audio, video, and transcribe speech using Aliyun Bailian (Qwen/Wan/PixVerse/Kling/CosyVoice) models.
Features
- Image Generation:
z-image-turbo,wan2.6-t2i,wan2.7-image-pro - Image Editing:
wan2.7-image-pro - Video Editing:
wan2.7-videoedit - ASR (Speech-to-Text):
qwen3-asr-flash - TTS (Text-to-Speech):
qwen3-tts-flash - Text-to-Video:
wan2.7-t2v,wan2.6-t2v,pixverse/pixverse-v5.6-t2v,kling/kling-v3-video-generation - Image-to-Video:
wan2.7-i2v,wan2.6-i2v-flash,wan2.6-i2v,pixverse/pixverse-v5.6-it2v,kling/kling-v3-video-generation - Reference-to-Video:
wan2.6-r2v-flash,wan2.6-r2v,pixverse/pixverse-v5.6-r2v