transcribe
Transcribe audio files to text with optional speaker diarization and known-speaker hints.
- Supports fast text transcription via
gpt-4o-mini-transcribeand speaker-labeled diarization viagpt-4o-transcribe-diarize - Accepts multiple audio formats and optional known-speaker references (up to 4 speakers) to improve diarization accuracy
- Outputs as plain text, JSON, or diarized JSON with configurable output directories to prevent overwrites
- Requires
OPENAI_API_KEYenvironment variable; uses bundled Python CLI for deterministic, repeatable transcription runs
Audio Transcribe
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
Workflow
- Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
- Verify
OPENAI_API_KEYis set. If missing, ask the user to set it locally (do not ask them to paste the key). - Run the bundled
transcribe_diarize.pyCLI with sensible defaults (fast text transcription). - Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
- Save outputs under
output/transcribe/when working in this repo.
Decision rules
- Default to
gpt-4o-mini-transcribewith--response-format textfor fast transcription. - If the user wants speaker labels or diarization, use
--model gpt-4o-transcribe-diarize --response-format diarized_json. - If audio is longer than ~30 seconds, keep
--chunking-strategy auto. - Prompting is not supported for
gpt-4o-transcribe-diarize.
Output conventions
More from openai/skills
screenshot
Use when the user explicitly asks for a desktop or system screenshot (full screen, specific app or window, or a pixel region), or when tool-specific capture capabilities are unavailable and an OS-level capture is needed.
2.7Ksecurity-best-practices
Perform language and framework specific security best-practice reviews and suggest improvements. Trigger only when the user explicitly requests security best practices guidance, a security review/report, or secure-by-default coding help. Trigger only for supported languages (python, javascript/typescript, go). Do not trigger for general code review, debugging, or non-security tasks.
2.5Kfigma
Use the Figma MCP server to fetch design context, screenshots, variables, and assets from Figma, and to translate Figma nodes into production code. Trigger when a task involves Figma URLs, node IDs, design-to-code implementation, or Figma MCP setup and troubleshooting.
2.4Kplaywright
Use when the task requires automating a real browser from the terminal (navigation, form filling, snapshots, screenshots, data extraction, UI-flow debugging) via `playwright-cli` or the bundled wrapper script.
2.4Kpdf
Use when tasks involve reading, creating, or reviewing PDF files where rendering and layout matter; prefer visual checks by rendering pages (Poppler) and use Python tools such as `reportlab`, `pdfplumber`, and `pypdf` for generation and extraction.
2.3Kfigma-implement-design
Translates Figma designs into production-ready application code with 1:1 visual fidelity. Use when implementing UI code from Figma files, when user mentions "implement design", "generate code", "implement component", provides Figma URLs, or asks to build components matching Figma specs. For Figma canvas writes via `use_figma`, use `figma-use`.
2.2K