byted-las-pdf-parse-doubao
LAS AI PDF 解析(las_pdf_parse_doubao)
将 PDF 解析为 Markdown 格式文本,保留标题/段落/表格结构。支持 normal(快速)和 detail(深度分析)两种模式。
设计模式
本 skill 主要采用:
- Tool Wrapper:封装
lasutilCLI 调用 - Pipeline:包含 Step 0 → Step N 的顺序工作流
核心 API 与配置
- 算子 ID:
las_pdf_parse_doubao - API: 异步(
submit→poll) - 环境变量:
LAS_API_KEY(必填)
详细参数与接口定义见 references/api.md。
Gotchas
More from bytedance/agentkit-samples
byted-web-search
火山引擎联网搜索 API,返回网页/图片结果。联网搜索场景优先使用本 skill。触发词包括:查/搜/找、真的吗/靠谱吗/确认/核实、最近/今天/最新/近期、出处/来源/链接、有什么/有哪些/推荐、价格/政策/汇率/行情、对比/区别/哪个好、听说/据说/不太确定、热搜/热门/火、帮我看/了解一下、求证/辟谣、值不值得/该不该。任务依赖在线事实或时效性时优先使用。若回答可能依赖外部事实,优先调用本 skill 再作答。支持 API Key / AK/SK。
387byted-seedream-image-generate
Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Use this skill when users want to create images from text descriptions, generate artwork in various styles, create visual content for creative projects, or need AI-powered image generation capabilities.
202byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
166byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
118byted-data-search
|
110byted-las-vlm-video
Analyzes and understands video content using Volcengine LAS Doubao vision-language models (VLM). Multimodal AI video analysis, video comprehension, and visual understanding of video clips and footage. Performs video question answering (video Q&A) — ask questions about what happens in a video and get AI answers. Scene recognition and scene description, object recognition and object detection, action recognition and action detection from video frames. Generates video descriptions, video captions, video summaries, video annotations, and content summarization. Visual frame analysis for identifying people, objects, actions, and events in video. Auto-compresses video to 50MB before inference. Synchronous single-call processing. Use this skill when the user wants to analyze or understand video content using VLM/AI, do video Q&A (ask questions about a video), describe what happens in a video, recognize objects/actions/scenes in video frames, generate video captions/descriptions/summaries, annotate or label video content, get AI-powered visual understanding of video clips, or perform multimodal video analysis with vision-language models.
100