speech

Installation
Summary

Text-to-speech generation for narration, voiceovers, IVR prompts, and accessibility reads via OpenAI Audio API.

  • Supports single clips and batch processing; defaults to gpt-4o-mini-tts-2025-12-15 with built-in voices (cedar, marin, and others)
  • Includes instruction augmentation for voice affect, tone, pacing, emotion, and emphasis; instructions supported only on GPT-4o mini TTS models
  • Enforces 4096-character input limit per request and 50 requests/minute rate cap; splits longer text into chunks automatically
  • Requires OPENAI_API_KEY environment variable; uses bundled CLI (scripts/text_to_speech.py) for deterministic, reproducible runs
  • Provides use-case templates for narration, product demos, IVR prompts, and accessibility reads; custom voice creation is out of scope
SKILL.md

Speech Generation Skill

Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to gpt-4o-mini-tts-2025-12-15 and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs.

When to use

  • Generate a single spoken clip from text
  • Generate a batch of prompts (many lines, many files)

Decision tree (single vs batch)

  • If the user provides multiple lines/prompts or wants many outputs -> batch
  • Else -> single

Workflow

  1. Decide intent: single vs batch (see decision tree above).
  2. Collect inputs up front: exact text (verbatim), desired voice, delivery style, format, and any constraints.
  3. If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL.
  4. Augment instructions into a short labeled spec without rewriting the input text.
  5. Run the bundled CLI (scripts/text_to_speech.py) with sensible defaults (see references/cli.md).
Related skills

More from openai/skills

Installs
1.2K
Repository
openai/skills
GitHub Stars
18.9K
First Seen
Jan 28, 2026