vtake-cut

Installation
SKILL.md

VTake Local Workflow

VTake converts a local input video into a card-based composition. The agent designs the cards (timing + content) and writes each card's HTML directly in the conversation, then assembles a single composition HTML and renders it to MP4 via hyperframes. There is no fixed archetype list and no prescribed card structure — the cards emerge from what the transcript actually says.

Inspectable intermediate files in the work directory:

  • metadata.json — duration / width / height / fps
  • audio.mp3 — extracted audio
  • transcript.json — segments + words with timestamps
  • storyboard.json — lightweight card outline (the agent's plan)
  • public/cards/card-XX.html — one HTML fragment per card
  • public/index.html — final assembled composition
  • output.mp4 — rendered video
Installs
50
GitHub Stars
107
First Seen
May 31, 2026
vtake-cut — notedit/vtake-skills