MinerU Document Extractor
Pass
Audited by Gen Agent Trust Hub on May 12, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONDATA_EXFILTRATION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill instructs the installation of the
mineru-open-apipackage from the official NPM registry and its source from the author's GitHub repository (github.com/opendatalab/MinerU-Ecosystem). These are verified vendor resources. - [COMMAND_EXECUTION]: The skill uses the
mineru-open-apiCLI to perform document conversions, authentication, and web crawling tasks. The instructions include security-conscious practices such as quoting file paths to prevent command injection in the shell. - [DATA_EXFILTRATION]: The skill includes functionality to fetch content from external URLs via the
crawlandextractcommands. This is the primary intended purpose of the tool and is used to process documents provided by the user. - [INDIRECT_PROMPT_INJECTION]: A potential attack surface exists because the skill processes untrusted external data (PDFs, Word documents, and web pages). Malicious content embedded in these documents could attempt to influence the agent's behavior.
- Ingestion points: External documents and URLs processed by
flash-extract,extract, andcrawlcommands inSKILL.md. - Boundary markers: None explicitly defined for document content parsing.
- Capability inventory: The skill has command execution and network access capabilities via the CLI.
- Sanitization:
SKILL.mdinstructs the agent to quote file paths, but static sanitization of the extracted document content is not present in the instructions.
Audit Metadata