autoresearch-agent

Warn

Audited by Gen Agent Trust Hub on May 5, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The main execution script 'scripts/run_experiment.py' utilizes 'subprocess.run(shell=True)' to run the 'evaluate_cmd' defined in the experiment's configuration file.
  • [COMMAND_EXECUTION]: 'scripts/setup_experiment.py' uses 'shell=True' to validate the user-provided evaluation command during the setup process.
  • [COMMAND_EXECUTION]: Multiple built-in performance evaluators, such as 'benchmark_speed.py' and 'build_speed.py', use 'shell=True' to execute build and test commands.
  • [PROMPT_INJECTION]: LLM-based evaluators ('llm_judge_content.py', 'llm_judge_copy.py', 'llm_judge_prompt.py') read content from the target file ('TARGET_FILE') and interpolate it into evaluation prompts. These prompts are passed to a CLI tool without sanitization, creating an indirect prompt injection surface. The ingestion occurs via 'Path.read_text()' and markers consist of simple triple-dash delimiters. This could allow malicious content within the file being optimized to influence the scoring and subsequent decision to keep or discard changes.
Audit Metadata
Risk Level
MEDIUM
Analyzed
May 5, 2026, 04:20 AM