openai-whisper
Purpose
This skill enables local audio transcription using the OpenAI Whisper model, processing files directly on the device for privacy and speed. It's ideal for converting speech to text with features like multi-language support, word-level timestamps, and speaker diarization.
When to Use
Use this skill for tasks involving local audio files, such as transcribing interviews, podcasts, or meetings, when you need offline processing to avoid network dependencies or data privacy concerns. Apply it in workflows requiring accurate timestamps or speaker identification, like content creation or analysis.
Key Capabilities
- Transcription: Converts audio to text in over 90 languages; specify language via
--languageflag (e.g.,--language enfor English). - Timestamps: Outputs word-level timings; enable with
--word-levelfor detailed segments (e.g., each word's start/end in seconds). - Speaker Diarization: Identifies speakers in audio; requires additional setup like Pyannote library; use
--diarizeflag if configured. - Multi-Format Support: Handles input formats like MP3, WAV, or FLAC; outputs JSON or SRT for easy parsing.
- Model Selection: Choose from models like tiny, base, small, medium, or large; larger models improve accuracy but increase compute needs (e.g.,
--model medium).
Usage Patterns
Always run Whisper in a Python environment with the library installed. For basic transcription, load an audio file and specify options via CLI. Use it in pipelines by piping output to other tools, like text analysis skills. For speaker diarization, ensure dependencies are installed first. Example 1: Transcribe a short audio clip for note-taking. Example 2: Process a multi-speaker recording for meeting summaries.
More from alphaonedev/openclaw-graph
playwright-scraper
Playwright web scraping: dynamic content, auth flows, pagination, data extraction, screenshots
1.4Kgcp-iam
Manages identity and access control for Google Cloud resources using IAM policies and roles.
370humanize-ai-text
AI text humanization: reduce AI-detection patterns, natural phrasing, tone adjustment
260macos-automation
AppleScript, JXA, Shortcuts, Automator, osascript, System Events, accessibility API
173tavily-web-search
Tavily: web search optimized for AI agents, answer synthesis, domain filtering, depth control
154clawflows
OpenClaw workflow automation: multi-step task chains, conditional logic, triggers, schedule
102