video-voiceover
Installation
SKILL.md
What this does
Reads a timestamped narration script and synthesizes one audio clip per segment, fitting speech
to each segment's time slot (dynamic rate), then records placement metadata. The only engine is
MiMo TTS (mimo-v2.5-tts).
Requirements
export MIMO_API_KEY=*** # MiMo TTS (or a TTS-specific MIMO_TTS_API_KEY)
Input contract
work_dir/narration.json — segments with start / end / narration (+ optional pause_after_ms,
overlaps_speech). Times are the output-timeline seconds the audio will be placed at.
In the orchestrated cut-mode flow, the agent writes narration.json directly against the output
timeline, and the orchestrator passes it here. In the legacy direct-cut path,
narration_mapped.json may be passed explicitly instead.