byted-voice-to-text
Pass
Audited by Gen Agent Trust Hub on Mar 26, 2026
Risk Level: SAFEPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
- [PROMPT_INJECTION]: The skill exhibits an indirect prompt injection surface by converting untrusted audio input into text that the agent processes as user instructions.\n
- Ingestion points:
scripts/asr.pyingests audio through local file paths, remote URLs, and Feishu file keys.\n - Boundary markers: Absent. The transcribed output is not enclosed in delimiters or marked as potentially untrusted content.\n
- Capability inventory:
scripts/asr.pyutilizesrequestsfor network operations and performs file reads for audio processing.\n - Sanitization: Absent. The skill does not filter or sanitize the transcription before presenting it to the agent.\n- [DATA_EXFILTRATION]: The skill transmits audio data to an external service for speech processing.\n
- Evidence:
scripts/asr.pysends audio data tohttps://openspeech.bytedance.com, which is the official API endpoint for the author's speech recognition service.\n - Evidence: The script fetches audio from Feishu's official API (
https://open.feishu.cn) when a file key is provided.
Audit Metadata