elevenlabs-stt
98%+ accurate transcription with speaker diarization, audio event tagging, and word-level forced alignment.
- Supports Scribe v1 and v2 models with auto-detection across 90+ languages
- Capabilities include speaker identification, audio event tagging (laughter, applause, music), and precise word-level timestamps via forced alignment
- Forced alignment enables subtitle generation, lip-sync timing, and karaoke applications by aligning known text to audio
- Requires inference.sh CLI (
infsh) for execution; integrates with video captioning and other audio workflows
Install the belt CLI skill:
npx skills add belt-sh/cli
ElevenLabs Speech-to-Text
High-accuracy transcription with Scribe models via inference.sh CLI.

Quick Start
Requires inference.sh CLI (
belt). Install instructions
belt login
# Transcribe audio
belt app run elevenlabs/stt --input '{"audio": "https://audio.mp3"}'
More from inferen-sh/skills
nano-banana
Generate images with Google Gemini native image models via inference.sh CLI. Models: Gemini 3 Pro Image, Gemini 2.5 Flash Image. Capabilities: text-to-image, image editing, multi-image input. Triggers: nano banana, gemini image, gemini 3 pro image, gemini 2.5 flash image, google image generation, native image generation, gemini native image
0ai-rag-pipeline
0press-release-writing
Press release writing in AP style with inverted pyramid structure. Covers formatting, datelines, quotes, boilerplates, and fact-checking. Use for: product launches, funding announcements, partnerships, company news, events. Triggers: press release, pr writing, media release, news release, announcement, product launch announcement, funding announcement, company news, media advisory, ap style, press statement, news wire
0agent-tools
Run 250+ AI apps via inference.sh CLI - image generation, video creation, LLMs, search, 3D, Twitter automation. Models: FLUX, Veo, Gemini, Grok, Claude, Seedance, OmniHuman, Tavily, Exa, OpenRouter, and many more. Use when running AI apps, generating images/videos, calling LLMs, web search, or automating Twitter. Triggers: inference.sh, infsh, ai model, run ai, serverless ai, ai api, flux, veo, claude api, image generation, video generation, openrouter, tavily, exa search, twitter api, grok
0ai-image-generation
Generate AI images with GPT-Image-2, FLUX, Gemini, Grok, Seedream, Reve and 50+ models via inference.sh CLI. Models: GPT-Image-2, FLUX Dev LoRA, FLUX.2 Klein LoRA, Gemini 3 Pro Image, Grok Imagine, Seedream 4.5, Reve, ImagineArt. Capabilities: text-to-image, image-to-image, inpainting, LoRA, image editing, upscaling, text rendering. Use for: AI art, product mockups, concept art, social media graphics, marketing visuals, illustrations. Triggers: flux, image generation, ai image, text to image, stable diffusion, generate image, ai art, midjourney alternative, dall-e alternative, text2img, t2i, image generator, ai picture, create image with ai, generative ai, ai illustration, grok image, gemini image, gpt image, openai image, chatgpt image
0google-veo
0