skills/htdt/godogen/visual-qa/Gen Agent Trust Hub

visual-qa

Warn

Audited by Gen Agent Trust Hub on Apr 19, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill instructions for logging analysis results involve the execution of a complex shell command string using nested subshells and heredocs (e.g., printf ... "$(cat <<'LOGEOF' ...)"). This pattern is highly susceptible to command injection if the agent interpolates unvalidated data from user arguments or image analysis results containing shell metacharacters like backticks or $() into the command.
  • [DATA_EXFILTRATION]: The visual_qa.py script and the skill's native execution mode read the contents of local files provided via command-line arguments and transmit them to the Google Gemini API. While this is the intended function for image analysis, the lack of path validation allows for the potential reading and exfiltration of sensitive local configuration or credential files if their paths are passed as arguments.
  • [PROMPT_INJECTION]: The skill uses imperative behavioral overrides ("CRITICAL: Your job is to find problems...", "Do not rationalize...") designed to force the agent into a specific mindset. These instructions can be leveraged as templates for bypassing standard model constraints or safety filters.
  • [PROMPT_INJECTION]: The skill exhibits an indirect prompt injection surface by processing untrusted external data (screenshots and task context) which is directly interpolated into model prompts without sanitization.
  • Ingestion points: User-provided images and the --context or --question text arguments are processed by visual_qa.py and the agent's internal vision tools.
  • Boundary markers: The skill lacks explicit delimiters or instructions for the model to ignore embedded commands within the processed data.
  • Capability inventory: The skill is capable of shell command execution (printf), local Python script execution (visual_qa.py), and arbitrary file reading.
  • Sanitization: No validation or sanitization is performed on the content of the images or the text context before they are sent to the vision model.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 19, 2026, 02:39 PM