bailian-multimodal-skills

Installation
SKILL.md

Bailian Multimodal Skills

Generate images, audio, video, and transcribe speech using Aliyun Bailian (Qwen/Wan/PixVerse/Kling/CosyVoice) models.

Features

  • Image Generation: z-image-turbo, wan2.6-t2i, wan2.7-image-pro
  • Image Editing: wan2.7-image-pro
  • Video Editing: wan2.7-videoedit
  • ASR (Speech-to-Text): qwen3-asr-flash
  • TTS (Text-to-Speech): qwen3-tts-flash
  • Text-to-Video: wan2.7-t2v, wan2.6-t2v, pixverse/pixverse-v5.6-t2v, kling/kling-v3-video-generation
  • Image-to-Video: wan2.7-i2v, wan2.6-i2v-flash, wan2.6-i2v, pixverse/pixverse-v5.6-it2v, kling/kling-v3-video-generation
  • Reference-to-Video: wan2.6-r2v-flash, wan2.6-r2v, pixverse/pixverse-v5.6-r2v

Usage

1. Image Generation

Installs
24
GitHub Stars
7
First Seen
Mar 25, 2026