What this does

Reads a timestamped narration script and synthesizes one audio clip per segment, fitting speech to each segment's time slot (dynamic rate), then records placement metadata. The only engine is MiMo TTS (mimo-v2.5-tts).

Requirements

export MIMO_API_KEY=***         # MiMo TTS (or a TTS-specific MIMO_TTS_API_KEY)

Input contract

work_dir/narration.json — segments with start / end / narration (+ optional pause_after_ms, overlaps_speech). Times are the output-timeline seconds the audio will be placed at. In the orchestrated cut-mode flow, the agent writes narration.json directly against the output timeline, and the orchestrator passes it here. In the legacy direct-cut path, narration_mapped.json may be passed explicitly instead.

video-voiceover

What this does

Requirements

Input contract