byted-mediakit-voiceover-editing
一、模式与凭据
1.1 三种执行模式
| 模式 | 说明 | 所需环境变量 | ASR 方式 |
|---|---|---|---|
| apig | SkillHub 网关代理,Bearer Token 认证 | ARK_SKILL_API_BASE + ARK_SKILL_API_KEY(容器注入)+ VOLC_SPACE_NAME + ASR_API_KEY + ASR_BASE_URL |
豆包语音大模型 |
| cloud | 直连火山引擎 OpenAPI,HMAC 签算 | VOLC_ACCESS_KEY_ID + VOLC_ACCESS_KEY_SECRET + VOLC_SPACE_NAME + ASR_API_KEY + ASR_BASE_URL |
豆包语音大模型 |
| local | 完全本地执行,无需云端服务 | 无(可选 EXECUTION_MODE=local) |
Qwen3-ASR 本地推理 |
优先级:apig > cloud > local。自动检测按此顺序依次检查环境变量,缺参时打印 .env 路径与缺失变量列表并自动降级。
1.2 凭据配置
.env文件位置:<SKILL_DIR>/.env- 脚本先读进程环境变量,再用
.env补全未设置的项(不覆盖容器注入) ARK_SKILL_*通常由部署容器注入,不必手写到.env- 缺参不阻塞:不使用终端
input()交互,缺参时打印提示信息并自动降级到可用模式 - Agent 推荐用户通过编辑
.env文件或Agent 文件写入工具来配置变量,避免终端粘贴问题
More from bytedance/agentkit-samples
byted-web-search
火山引擎联网搜索 API,返回网页/图片结果。联网搜索场景优先使用本 skill。触发词包括:查/搜/找、真的吗/靠谱吗/确认/核实、最近/今天/最新/近期、出处/来源/链接、有什么/有哪些/推荐、价格/政策/汇率/行情、对比/区别/哪个好、听说/据说/不太确定、热搜/热门/火、帮我看/了解一下、求证/辟谣、值不值得/该不该。任务依赖在线事实或时效性时优先使用。若回答可能依赖外部事实,优先调用本 skill 再作答。支持 API Key / AK/SK。
387byted-seedream-image-generate
Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Use this skill when users want to create images from text descriptions, generate artwork in various styles, create visual content for creative projects, or need AI-powered image generation capabilities.
202byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
166byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
133byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
118byted-data-search
|
110