video-understanding
Installation
SKILL.md
What this does
Turns a source video into an understanding index an agent (or a downstream stage) can read:
- Scene detection —
scenes.json(cut points, durations) + junk-scene filtering. - Frame extraction — sampled frames for the visual analysis.
- ASR —
asr_result.json(timestamped dialogue) via MiMomimo-v2.5-asr. - Silence detection —
silence_periods.json(quiet windows,has_speechflag). - VLM analysis —
vlm_analysis.json(per-scene description, depth analysis,frame_facts). - Timeline fusion + brief —
timeline_fusion.json,asr_writing_chunks.json,agent_narration_brief.md.
Stateless: reusable stages are skipped only when their output and provenance sidecar match
the current source video plus output-affecting settings. --force recomputes.
Requirements
# ffmpeg: brew install ffmpeg | apt install ffmpeg | choco install ffmpeg
export MIMO_API_KEY=*** # one key drives ASR (mimo-v2.5-asr) + VLM (mimo-v2.5)