byted-podcast-tts
Podcast TTS Skill
基于火山引擎豆包语音合成 WebSocket 协议(PodcastTTS,/api/v3/sami/podcasttts)将某个话题合成为播客音频并保存为本地文件。支持:
- 输入一句话题文本或者一个网页地址(也可以是个文件下载地址,支持 pdf/word/txt 格式)生成播客
- 输出播客音频下载链接
- 输出播客分段文本(JSON)
适用场景
- 用户提到
生成播客或播客合成等相关关键词。 - 用户需要为某个话题生成播客形式的音频文件。
- 用户需要某个网页或文件内容生成播客形式的音频文件。
- 用户需要为用户上传的文件内容或者一个长上下文生成播客形式的音频文件。
使用步骤
- 分析用户需要合成播客的内容,准备要合成的输入:
prompt_text(原始话题,一般不超过 20 个字)或input_url(网页地址或文件下载地址) 或者text(用户上传文件读取出来的内容或者是一个比较长的文本,一般超过 200 个字)。 - 运行脚本前先
cd到本技能目录:skills/byted-podcast-gen。 - 配置鉴权(环境变量或命令行参数)。
- 执行脚本:
python scripts/podcast.py [参数]。参考下面示例部分。 - 根据脚本输出的 JSON 里的
audio_path/texts/audio_url使用生成结果,如果有audio_url是一个带过期时间的 URL, 可以返回给用户,audio_path是本地文件路径, 可以给用户提供下载。
环境变量与鉴权
More from bytedance/agentkit-samples
byted-web-search
火山引擎联网搜索 API,返回网页/图片结果。联网搜索场景优先使用本 skill。触发词包括:查/搜/找、真的吗/靠谱吗/确认/核实、最近/今天/最新/近期、出处/来源/链接、有什么/有哪些/推荐、价格/政策/汇率/行情、对比/区别/哪个好、听说/据说/不太确定、热搜/热门/火、帮我看/了解一下、求证/辟谣、值不值得/该不该。任务依赖在线事实或时效性时优先使用。若回答可能依赖外部事实,优先调用本 skill 再作答。支持 API Key / AK/SK。
385byted-seedream-image-generate
Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Use this skill when users want to create images from text descriptions, generate artwork in various styles, create visual content for creative projects, or need AI-powered image generation capabilities.
200byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
165byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
132byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
117byted-data-search
|
109