speech-to-text

Installation

Summary

Transcribe audio to text using ElevenLabs Scribe or Whisper models via inference.sh CLI.

Three model options: ElevenLabs Scribe v2 (98%+ accuracy with diarization), Fast Whisper V3, and Whisper V3 Large for varying speed/accuracy tradeoffs
Supports 99+ languages, optional timestamps, speaker diarization, and translation to English
Common workflows include meeting transcription, podcast transcripts, video subtitles, and voice note conversion
Requires inference.sh CLI (infsh) installation and authentication; accepts audio URLs or extracted video audio as input

SKILL.md

Install the belt CLI skill: npx skills add belt-sh/cli

Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Speech-to-Text

Requires inference.sh CLI (belt). Install instructions

belt login

belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'

Related skills

Installs

–

Repository

GitHub Stars

500

First Seen

–