autoresearch-agent
Warn
Audited by Gen Agent Trust Hub on May 5, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The main execution script 'scripts/run_experiment.py' utilizes 'subprocess.run(shell=True)' to run the 'evaluate_cmd' defined in the experiment's configuration file.
- [COMMAND_EXECUTION]: 'scripts/setup_experiment.py' uses 'shell=True' to validate the user-provided evaluation command during the setup process.
- [COMMAND_EXECUTION]: Multiple built-in performance evaluators, such as 'benchmark_speed.py' and 'build_speed.py', use 'shell=True' to execute build and test commands.
- [PROMPT_INJECTION]: LLM-based evaluators ('llm_judge_content.py', 'llm_judge_copy.py', 'llm_judge_prompt.py') read content from the target file ('TARGET_FILE') and interpolate it into evaluation prompts. These prompts are passed to a CLI tool without sanitization, creating an indirect prompt injection surface. The ingestion occurs via 'Path.read_text()' and markers consist of simple triple-dash delimiters. This could allow malicious content within the file being optimized to influence the scoring and subsequent decision to keep or discard changes.
Audit Metadata