stt-integration

Installation
SKILL.md

stt-integration

This skill provides comprehensive guidance for implementing ElevenLabs Speech-to-Text (STT) capabilities using the Scribe v1 model, which supports 99 languages with state-of-the-art accuracy, speaker diarization for up to 32 speakers, and seamless Vercel AI SDK integration.

Core Capabilities

Scribe v1 Model Features

  • Multi-language support: 99 languages with varying accuracy levels
  • Speaker diarization: Up to 32 speakers with identification
  • Word-level timestamps: Precise synchronization for video/audio alignment
  • Audio event detection: Identifies sounds like laughter and applause
  • High accuracy: Optimized for accuracy over real-time processing

Supported Formats

  • Audio: AAC, AIFF, OGG, MP3, Opus, WAV, WebM, FLAC, M4A
  • Video: MP4, AVI, Matroska, QuickTime, WMV, FLV, WebM, MPEG, 3GPP
  • Limits: Max 3 GB file size, 10 hours duration

Skill Structure

Related skills

More from vanman2024/ai-dev-marketplace

Installs
14
GitHub Stars
10
First Seen
Jan 28, 2026