The Agent Skills Directory

[COMMAND_EXECUTION]: The skill employs system commands via ffmpeg and ffprobe to perform critical media tasks such as extracting frames, isolating audio tracks, and performing scene detection. These calls are constructed as lists and passed to subprocess.run, which prevents common shell injection vulnerabilities.
Evidence: Subprocess calls in scripts/asr.py, scripts/detect.py, scripts/extract.py, and scripts/storyboard.py.
[DATA_EXFILTRATION]: Audio and video data are transmitted to the MiMo service endpoints (api.xiaomimimo.com and related subdomains) for processing. This is a core part of the skill's primary function and is explicitly documented for the user.
Evidence: Data upload logic in scripts/asr.py (_run_asr) and scripts/vlm.py (_analyze_mimo_video_chunk).
[EXTERNAL_DOWNLOADS]: The skill makes several requests to external API services for AI-based understanding. These connections are used to send media data and receive structured analysis results.
Evidence: Network configuration and API call wrappers defined in scripts/lib.py.
[PROMPT_INJECTION]: The skill processes character and story context from background_research.json and includes it in prompts sent to the VLM (Vision Language Model). This constitutes an indirect prompt injection surface.
Ingestion points: External background_research.json file in the work directory.
Boundary markers: Prompt templates in scripts/consolidate.py use explicit constraints (e.g., 'Iron Rules' or '铁律') to guide model output and prevent hallucinations.
Capability inventory: The agent can write JSON artifacts to disk, execute media commands via ffmpeg, and communicate with external APIs.
Sanitization: Input from research files is treated as context without extensive sanitization, but the model's output is strictly parsed into fixed JSON structures.

video-understanding