vtake-cut
Installation
SKILL.md
VTake Local Workflow
VTake converts a local input video into a card-based composition. The agent
designs the cards (timing + content) and writes each card's HTML directly
in the conversation, then assembles a single composition HTML and renders
it to MP4 via hyperframes. There is no fixed archetype list and no
prescribed card structure — the cards emerge from what the transcript
actually says.
Inspectable intermediate files in the work directory:
metadata.json— duration / width / height / fpsaudio.mp3— extracted audiotranscript.json— segments + words with timestampsstoryboard.json— lightweight card outline (the agent's plan)public/cards/card-XX.html— one HTML fragment per cardpublic/index.html— final assembled compositionoutput.mp4— rendered video