elevenlabs-stt

Installation

Summary

98%+ accurate transcription with speaker diarization, audio event tagging, and word-level forced alignment.

Supports Scribe v1 and v2 models with auto-detection across 90+ languages
Capabilities include speaker identification, audio event tagging (laughter, applause, music), and precise word-level timestamps via forced alignment
Forced alignment enables subtitle generation, lip-sync timing, and karaoke applications by aligning known text to audio
Requires inference.sh CLI (infsh) for execution; integrates with video captioning and other audio workflows

SKILL.md

Install the belt CLI skill: npx skills add belt-sh/cli

ElevenLabs Speech-to-Text

High-accuracy transcription with Scribe models via inference.sh CLI.

ElevenLabs STT

Requires inference.sh CLI (belt). Install instructions

belt login

# Transcribe audio
belt app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'

Related skills

Installs

–

Repository

GitHub Stars

500

First Seen

–

elevenlabs-stt — inferen-sh/skills