video-understanding
SKILL.md
Video Understanding (Gemini)
Analyze videos using Google Gemini's multimodal video understanding. Supports 1000+ video sources via yt-dlp.
Requirements
yt-dlp—brew install yt-dlp/pip install yt-dlpffmpeg—brew install ffmpeg(for merging video+audio streams)GEMINI_API_KEYenvironment variable
Default Output
Returns structured JSON:
- transcript — Verbatim transcript with
[MM:SS]timestamps - description — Visual description (people, setting, UI, text on screen, flow)
- summary — 2-3 sentence summary
- duration_seconds — Estimated duration
- speakers — Identified speakers