clawshire-doc-extract-engine
Pass
Audited by Gen Agent Trust Hub on Apr 6, 2026
Risk Level: SAFEDATA_EXFILTRATIONEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [DATA_EXFILTRATION]: The provided Python script reads local PDF files and transmits them to the external ClawShire API endpoint (api.clawshire.cn) for structural processing. While this is the intended functionality, it involves outbound transmission of user data.
- [EXTERNAL_DOWNLOADS]: The tool supports downloading PDF documents from arbitrary remote URLs using standard Python libraries before uploading them to the extraction engine, which presents a surface for fetching untrusted content.
- [COMMAND_EXECUTION]: The skill requires the agent to execute a CLI script (clawshire_doc_extract_client.py) to manage its workflow, including file system access and network communication.
- [PROMPT_INJECTION]: The skill acts as an ingestion point for untrusted external data (PDFs/URLs) which may contain instructions targeting the agent (Indirect Prompt Injection). Evidence chain:
- Ingestion points: Document content from local paths or remote URLs processed in scripts/clawshire_doc_extract_client.py.
- Boundary markers: None identified in the script's output to the agent.
- Capability inventory: Network communication and file system read/write operations.
- Sanitization: Document content is not locally sanitized or filtered for injection patterns before being processed or output to the agent context.
Audit Metadata