wan-t2v-video
SKILL.md
WAN 2.2 Text-to-Video (T2V) Workflows
Overview
WAN 2.2 T2V generates videos from text prompts using a 14B parameter MoE (Mixture of Experts) architecture split across two specialized models:
- HighNoise model: Handles early denoising — establishes structure, motion, composition
- LowNoise model: Handles late denoising — refines details, sharpens output
This dual-model technique is the same as FLF/I2V (see wan-flf-video skill) but without image conditioning nodes.
Key difference from I2V/FLF: T2V does NOT use CLIPVisionEncode, WanFirstLastFrameToVideo, or any image input. It uses EmptyHunyuanLatentVideo for latent initialization and text-only conditioning.