ai-provider-openai-whisper

OpenAI Whisper Patterns

Quick Guide: Use client.audio.transcriptions.create() for speech-to-text and client.audio.translations.create() for non-English audio to English text. Choose gpt-4o-transcribe for highest accuracy, gpt-4o-mini-transcribe for cost-efficiency, whisper-1 for timestamps/SRT/VTT, or gpt-4o-transcribe-diarize for speaker identification. Files must be under 25 MB -- chunk larger files. Use prompt to guide vocabulary and style. Streaming is available via stream: true for progressive output on gpt-4o-transcribe models.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering, import type, named constants)

(You MUST choose the correct model for the use case -- gpt-4o-transcribe for accuracy, whisper-1 for timestamps/SRT/VTT output, gpt-4o-transcribe-diarize for speaker labels)

(You MUST chunk audio files larger than 25 MB before sending to the API -- the API rejects files exceeding this limit)

(You MUST pass response_format: "verbose_json" when using timestamp_granularities -- timestamps only work with this format on whisper-1)

(You MUST set chunking_strategy: "auto" when using gpt-4o-transcribe-diarize with audio longer than 30 seconds -- the API requires it)

ai-provider-openai-whisper

OpenAI Whisper Patterns

CRITICAL: Before Using This Skill

More from agents-inc/skills

web-animation-css-animations

web-animation-view-transitions

web-testing-playwright-e2e

web-styling-cva

web-animation-framer-motion

web-i18n-next-intl