skills/modelscope.cn/wan-t2v-video

wan-t2v-video

SKILL.md

WAN 2.2 Text-to-Video (T2V) Workflows

Overview

WAN 2.2 T2V generates videos from text prompts using a 14B parameter MoE (Mixture of Experts) architecture split across two specialized models:

  • HighNoise model: Handles early denoising — establishes structure, motion, composition
  • LowNoise model: Handles late denoising — refines details, sharpens output

This dual-model technique is the same as FLF/I2V (see wan-flf-video skill) but without image conditioning nodes.

Key difference from I2V/FLF: T2V does NOT use CLIPVisionEncode, WanFirstLastFrameToVideo, or any image input. It uses EmptyHunyuanLatentVideo for latent initialization and text-only conditioning.

Models

UNET (Installed)

Installs
1
First Seen
Apr 21, 2026