audio-transcribe
Audio Transcribe
Transcribes audio files to text with timestamps. Supports automatic language detection, speaker identification (diarization), and outputs structured JSON with segment-level timing.
Command
npx agent-media@latest audio transcribe --in <path> [options]
Inputs
| Option | Required | Description |
|---|---|---|
--in |
Yes | Input audio file path or URL (supports mp3, wav, m4a, ogg) |
--diarize |
No | Enable speaker identification |
--language |
No | Language code (auto-detected if not provided) |
--speakers |
No | Number of speakers hint for diarization |
--out |
No | Output path, filename or directory (default: ./) |
More from agntswrm/agent-media
image-remove-background
Removes the background from an image, leaving the foreground subject with transparency. Use when you need to isolate subjects, create cutouts, or prepare images for compositing.
134image-crop
Crops an image to specified dimensions around a focal point. Use when you need to extract a portion of an image, create thumbnails with custom positioning, or prepare images for specific aspect ratios.
88image-generate
Generates an image from a text prompt using AI models. Use when you need to create images from descriptions, generate artwork, or produce visual content from text.
81video-generate
Generates video from text prompts or animates static images. Use when you need to create videos from descriptions, animate images, or produce video content using AI.
70image-edit
Edits an existing image using a text prompt. Use when you need to modify, enhance, or transform an image based on text instructions.
66image-resize
Resizes an image to specified dimensions. Use when you need to change image size, create thumbnails, or prepare images for specific display requirements.
52