speech-to-text
Installation
Summary
Transcribe audio to text using ElevenLabs Scribe or Whisper models via inference.sh CLI.
- Three model options: ElevenLabs Scribe v2 (98%+ accuracy with diarization), Fast Whisper V3, and Whisper V3 Large for varying speed/accuracy tradeoffs
- Supports 99+ languages, optional timestamps, speaker diarization, and translation to English
- Common workflows include meeting transcription, podcast transcripts, video subtitles, and voice note conversion
- Requires inference.sh CLI (
infsh) installation and authentication; accepts audio URLs or extracted video audio as input
SKILL.md
Install the belt CLI skill:
npx skills add belt-sh/cli
Speech-to-Text
Transcribe audio to text via inference.sh CLI.

Quick Start
Requires inference.sh CLI (
belt). Install instructions
belt login
belt app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://audio.mp3"}'
Related skills
More from inferen-sh/skills
agent-browser
0chat-ui
0ai-avatar-video
0ai-podcast-creation
0elevenlabs-stt
0infsh-cli
Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok
0