transcribe

Installation
Summary

Transcribe audio files to text with optional speaker diarization and known-speaker hints.

  • Supports fast text transcription via gpt-4o-mini-transcribe and speaker-labeled diarization via gpt-4o-transcribe-diarize
  • Accepts multiple audio formats and optional known-speaker references (up to 4 speakers) to improve diarization accuracy
  • Outputs as plain text, JSON, or diarized JSON with configurable output directories to prevent overwrites
  • Requires OPENAI_API_KEY environment variable; uses bundled Python CLI for deterministic, repeatable transcription runs
SKILL.md

Audio Transcribe

Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.

Workflow

  1. Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
  2. Verify OPENAI_API_KEY is set. If missing, ask the user to set it locally (do not ask them to paste the key).
  3. Run the bundled transcribe_diarize.py CLI with sensible defaults (fast text transcription).
  4. Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
  5. Save outputs under output/transcribe/ when working in this repo.

Decision rules

  • Default to gpt-4o-mini-transcribe with --response-format text for fast transcription.
  • If the user wants speaker labels or diarization, use --model gpt-4o-transcribe-diarize --response-format diarized_json.
  • If audio is longer than ~30 seconds, keep --chunking-strategy auto.
  • Prompting is not supported for gpt-4o-transcribe-diarize.

Output conventions

Related skills

More from openai/skills

Installs
1.2K
Repository
openai/skills
GitHub Stars
18.9K
First Seen
Feb 1, 2026