autoresearch

Fail

Audited by Gen Agent Trust Hub on Mar 23, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill is designed to execute arbitrary shell commands defined in the eval_command configuration. These commands are run automatically in an iterative loop.
  • [COMMAND_EXECUTION]: The skill instructions explicitly mandate the use of mode: "bypassPermissions" for spawned subagents. This configuration grants the agent the ability to modify local files and execute shell commands without user confirmation.
  • [REMOTE_CODE_EXECUTION]: The framework permits the execution of arbitrary scripts and binaries through the evaluation harness. If the search space or evaluation logic is compromised, this can be used as a vector for executing untrusted code.
  • [DATA_EXFILTRATION]: The high-privilege execution environment allows the agent to read any file the user has access to. Combined with the ability to execute shell commands, this creates a significant risk of data exfiltration if the agent is directed to access sensitive files like .env, SSH keys, or cloud credentials.
  • [PROMPT_INJECTION]: The skill exhibits an indirect prompt injection surface by interpolating untrusted data from .autoresearch/program.md and .autoresearch/results.tsv into subagent prompts without boundary markers or sanitization.
  • Ingestion points: Contents of .autoresearch/program.md and .autoresearch/results.tsv are read and passed into the researcher subagent prompt.
  • Boundary markers: None present; content is directly interpolated into the prompt string without delimiters or warnings to ignore instructions.
  • Capability inventory: Subprocess execution via eval_command, arbitrary file writing to artifacts, and git operations.
  • Sanitization: No evidence of escaping or validation of the interpolated content before it is processed by the subagent.
  • [COMMAND_EXECUTION]: The documentation provides examples of evaluation scripts that utilize the Python subprocess module to run further shell commands, increasing the potential attack surface for command injection or unintended execution.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 23, 2026, 10:26 AM