speech-to-text

Installation
Summary

Transcribe audio to text using ElevenLabs Scribe or Whisper models via inference.sh CLI.

  • Three model options: ElevenLabs Scribe v2 (98%+ accuracy with diarization), Fast Whisper V3, and Whisper V3 Large for varying speed/accuracy tradeoffs
  • Supports 99+ languages, optional timestamps, speaker diarization, and translation to English
  • Common workflows include meeting transcription, podcast transcripts, video subtitles, and voice note conversion
  • Requires inference.sh CLI (infsh) installation and authentication; accepts audio URLs or extracted video audio as input
SKILL.md

Install the belt CLI skill: npx skills add belt-sh/cli

Speech-to-Text

Transcribe audio to text via inference.sh CLI.

Speech-to-Text

Quick Start

Requires inference.sh CLI (belt). Install instructions

belt login

belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'
Related skills
Installs
GitHub Stars
500
First Seen