video-cut — local-first raw-footage video editor

Drop raw clips in a folder → describe the cut → get edit/final.mp4. Fully local. The editorial counterpart to our generative video skills (launch-video, Remotion, cloned-voice-pitch-pipeline). Compounds on browser-use/video-use: same two-layer reading architecture, but local ASR instead of cloud ElevenLabs Scribe.

Core principle — two-layer reading (never dump frames)

The agent reads video through two cheap layers, not by watching frames:

Transcript layer — transcribe_local.py runs faster_whisper with word-level timestamps (local, MPS/CPU). Packed into takes_packed.md (~tens of KB) — the primary reading artifact. This is the routing projection.
Visual layer (on-demand) — timeline_view.py <video> <start> <end> renders a filmstrip + waveform + word-label PNG only at decision points (ambiguous pauses, retake comparisons, cut-point checks). Never a scan — the body-grep expansion.