captions

Installation
SKILL.md

Captions

Analyze the spoken content to determine caption style. If the user specifies a style, use that. Otherwise, detect tone from the transcript.

Transcript Source

The project's transcript.json contains word-level timestamps from whisper.cpp (--output-json-full with --dtw):

{
  "transcription": [
    {
      "offsets": { "from": 0, "to": 5000 },
      "text": " Hello world.",
      "tokens": [
        { "text": " Hello", "offsets": { "from": 0, "to": 1000 }, "p": 0.98 },
        { "text": " world", "offsets": { "from": 1000, "to": 2000 }, "p": 0.95 }
      ]
    }
Related skills

More from heygen-com/hyperframes

Installs
12
GitHub Stars
17.4K
First Seen
Mar 27, 2026