watch
Watch — read a video like a PDF
You don't have a video input. This skill bundles a Python script that fetches the timestamped transcript (and optionally frames) so you can answer questions about video content.
Default mode is transcript-only. No video download, no frame extraction — just captions pulled via yt-dlp in a few seconds. That covers ~every YouTube video and is the right default for research, summarization, and content analysis.
Opt into --with-frames only when the visual layer matters: debugging a screen recording, breaking down a thumbnail/hook visually, reading on-screen text, analyzing UI behavior.
Requirements
Locally installed:
yt-dlp—brew install yt-dlp(macOS) /pipx install yt-dlp(Linux) /winget install yt-dlp.yt-dlp(Windows)ffmpeg+ffprobe—brew install ffmpeg/sudo apt install ffmpeg/winget install Gyan.FFmpeg. Only required for--with-framesand the Whisper audio fallback. Transcript-only on a captioned YouTube video does not need ffmpeg.
If the user's missing one, tell them the install command and stop. Do not auto-install.
How to invoke
The script lives next to this SKILL.md. Invoke it with the absolute path or ${CLAUDE_SKILL_DIR}/watch.py: