transcribe
Audio Transcribe
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
Workflow
- Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
- Verify
OPENAI_API_KEYis set. If missing, ask the user to set it locally (do not ask them to paste the key). - Run the bundled
transcribe_diarize.pyCLI with sensible defaults (fast text transcription). - Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
- Save outputs under
output/transcribe/when working in this repo.
Decision rules
- Default to
gpt-4o-mini-transcribewith--response-format textfor fast transcription. - If the user wants speaker labels or diarization, use
--model gpt-4o-transcribe-diarize --response-format diarized_json. - If audio is longer than ~30 seconds, keep
--chunking-strategy auto. - Prompting is not supported for
gpt-4o-transcribe-diarize.
Output conventions
More from letta-ai/skills
extracting-pdf-text
Extract text from PDFs for LLM consumption. Use when processing PDFs for RAG, document analysis, or text extraction. Supports API services (Mistral OCR) and local tools (PyMuPDF, pdfplumber). Handles text-based PDFs, tables, and scanned documents with OCR.
257imessage
Send and read iMessages/SMS from macOS. Use for texting contacts, scheduling services, or automating message-based workflows. Triggers on queries about texting, messaging, SMS, iMessage, or contacting someone via text.
206video-processing
Guide for video analysis and frame-level event detection tasks using OpenCV and similar libraries. This skill should be used when detecting events in videos (jumps, movements, gestures), extracting frames, analyzing motion patterns, or implementing computer vision algorithms on video data. It provides verification strategies and helps avoid common pitfalls in video processing workflows.
189letta-api-client
Build applications with the Letta API — a model-agnostic, stateful API for building persistent agents with memory and long-term learning. Covers SDK patterns for Python and TypeScript. Includes 24 working code examples.
153google-workspace
Connect to Gmail and Google Calendar via OAuth 2.0. Use when users want to search/read emails, create drafts, search calendar events, check availability, or schedule meetings. Triggers on queries about email, inbox, calendar, schedule, or meetings.
127portfolio-optimization
Guidance for implementing high-performance portfolio optimization using Python C extensions. This skill applies when tasks require optimizing financial computations (matrix operations, covariance calculations, portfolio risk metrics) by implementing C extensions for Python. Use when performance speedup requirements exist (e.g., 1.2x or greater) and the task involves numerical computations on large datasets (thousands of assets).
101