elevenlabs-stt

Installation
Summary

98%+ accurate transcription with speaker diarization, audio event tagging, and word-level forced alignment.

  • Supports Scribe v1 and v2 models with auto-detection across 90+ languages
  • Capabilities include speaker identification, audio event tagging (laughter, applause, music), and precise word-level timestamps via forced alignment
  • Forced alignment enables subtitle generation, lip-sync timing, and karaoke applications by aligning known text to audio
  • Requires inference.sh CLI (infsh) for execution; integrates with video captioning and other audio workflows
SKILL.md

Install the belt CLI skill: npx skills add belt-sh/cli

ElevenLabs Speech-to-Text

High-accuracy transcription with Scribe models via inference.sh CLI.

ElevenLabs STT

Quick Start

Requires inference.sh CLI (belt). Install instructions

belt login

# Transcribe audio
belt app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'
Related skills

More from inferen-sh/skills

Installs
GitHub Stars
500
First Seen
elevenlabs-stt — inferen-sh/skills