video-understanding
Pass
Audited by Gen Agent Trust Hub on Jun 30, 2026
Risk Level: SAFECOMMAND_EXECUTIONDATA_EXFILTRATIONEXTERNAL_DOWNLOADS
Full Analysis
- [COMMAND_EXECUTION]: The skill employs system commands via
ffmpegandffprobeto perform critical media tasks such as extracting frames, isolating audio tracks, and performing scene detection. These calls are constructed as lists and passed tosubprocess.run, which prevents common shell injection vulnerabilities. - Evidence: Subprocess calls in
scripts/asr.py,scripts/detect.py,scripts/extract.py, andscripts/storyboard.py. - [DATA_EXFILTRATION]: Audio and video data are transmitted to the MiMo service endpoints (
api.xiaomimimo.comand related subdomains) for processing. This is a core part of the skill's primary function and is explicitly documented for the user. - Evidence: Data upload logic in
scripts/asr.py(_run_asr) andscripts/vlm.py(_analyze_mimo_video_chunk). - [EXTERNAL_DOWNLOADS]: The skill makes several requests to external API services for AI-based understanding. These connections are used to send media data and receive structured analysis results.
- Evidence: Network configuration and API call wrappers defined in
scripts/lib.py. - [PROMPT_INJECTION]: The skill processes character and story context from
background_research.jsonand includes it in prompts sent to the VLM (Vision Language Model). This constitutes an indirect prompt injection surface. - Ingestion points: External
background_research.jsonfile in the work directory. - Boundary markers: Prompt templates in
scripts/consolidate.pyuse explicit constraints (e.g., 'Iron Rules' or '铁律') to guide model output and prevent hallucinations. - Capability inventory: The agent can write JSON artifacts to disk, execute media commands via ffmpeg, and communicate with external APIs.
- Sanitization: Input from research files is treated as context without extensive sanitization, but the model's output is strictly parsed into fixed JSON structures.
Audit Metadata