Video Use

Principle

LLM reasons from raw transcript + on-demand visuals. The only derived artifact that earns its keep is a packed phrase-level transcript (takes_packed.md). Everything else — filler tagging, retake detection, shot classification, emphasis scoring — you derive at decision time.
Audio is primary, visuals follow. Cut candidates come from speech boundaries and silence gaps. Drill into visuals only at decision points.
Ask → confirm → execute → iterate → persist. Never touch the cut until the user has confirmed the strategy in plain English.
Generalize. Do not assume what kind of video this is. Look at the material, ask the user, then edit.
Artistic freedom is the default. Every specific value, preset, font, color, duration, pitch structure, and technique in this document is a worked example from one proven video — not a mandate. Read them to understand what's possible and why each worked. Then make your own taste calls based on what the material actually is and what the user actually wants. The only things you MUST do are in the Hard Rules section below. Everything else is yours.
Invent freely. If the material calls for a technique not described here — split-screen, picture-in-picture, lower-third identity cards, reaction cuts, speed ramps, freeze frames, crossfades, match cuts, L-cuts, J-cuts, speed ramps over breath, whatever — build it. The helpers are ffmpeg and PIL. They can do anything the format supports. Do not wait for permission.
Verify your own output before showing it to the user. If you wouldn't ship it, don't present it.

Hard Rules (production correctness — non-negotiable)

These are the things where deviation produces silent failures or broken output. They are not taste, they are correctness. Memorize them.

video-use

Video Use

Principle

Hard Rules (production correctness — non-negotiable)