speech
Text-to-speech generation for narration, voiceovers, IVR prompts, and accessibility reads via OpenAI Audio API.
- Supports single clips and batch processing; defaults to
gpt-4o-mini-tts-2025-12-15with built-in voices (cedar, marin, and others) - Includes instruction augmentation for voice affect, tone, pacing, emotion, and emphasis; instructions supported only on GPT-4o mini TTS models
- Enforces 4096-character input limit per request and 50 requests/minute rate cap; splits longer text into chunks automatically
- Requires
OPENAI_API_KEYenvironment variable; uses bundled CLI (scripts/text_to_speech.py) for deterministic, reproducible runs - Provides use-case templates for narration, product demos, IVR prompts, and accessibility reads; custom voice creation is out of scope
Speech Generation Skill
Generate spoken audio for the current project (narration, product demo voiceover, IVR prompts, accessibility reads). Defaults to gpt-4o-mini-tts-2025-12-15 and built-in voices, and prefers the bundled CLI for deterministic, reproducible runs.
When to use
- Generate a single spoken clip from text
- Generate a batch of prompts (many lines, many files)
Decision tree (single vs batch)
- If the user provides multiple lines/prompts or wants many outputs -> batch
- Else -> single
Workflow
- Decide intent: single vs batch (see decision tree above).
- Collect inputs up front: exact text (verbatim), desired voice, delivery style, format, and any constraints.
- If batch: write a temporary JSONL under tmp/ (one job per line), run once, then delete the JSONL.
- Augment instructions into a short labeled spec without rewriting the input text.
- Run the bundled CLI (
scripts/text_to_speech.py) with sensible defaults (see references/cli.md).
More from openai/skills
screenshot
Use when the user explicitly asks for a desktop or system screenshot (full screen, specific app or window, or a pixel region), or when tool-specific capture capabilities are unavailable and an OS-level capture is needed.
2.7Ksecurity-best-practices
Perform language and framework specific security best-practice reviews and suggest improvements. Trigger only when the user explicitly requests security best practices guidance, a security review/report, or secure-by-default coding help. Trigger only for supported languages (python, javascript/typescript, go). Do not trigger for general code review, debugging, or non-security tasks.
2.5Kfigma
Use the Figma MCP server to fetch design context, screenshots, variables, and assets from Figma, and to translate Figma nodes into production code. Trigger when a task involves Figma URLs, node IDs, design-to-code implementation, or Figma MCP setup and troubleshooting.
2.5Kplaywright
Use when the task requires automating a real browser from the terminal (navigation, form filling, snapshots, screenshots, data extraction, UI-flow debugging) via `playwright-cli` or the bundled wrapper script.
2.4Kpdf
Use when tasks involve reading, creating, or reviewing PDF files where rendering and layout matter; prefer visual checks by rendering pages (Poppler) and use Python tools such as `reportlab`, `pdfplumber`, and `pypdf` for generation and extraction.
2.4Kfigma-implement-design
Translates Figma designs into production-ready application code with 1:1 visual fidelity. Use when implementing UI code from Figma files, when user mentions "implement design", "generate code", "implement component", provides Figma URLs, or asks to build components matching Figma specs. For Figma canvas writes via `use_figma`, use `figma-use`.
2.2K