The Agent Skills Directory

[COMMAND_EXECUTION]: The main execution script 'scripts/run_experiment.py' utilizes 'subprocess.run(shell=True)' to run the 'evaluate_cmd' defined in the experiment's configuration file.
[COMMAND_EXECUTION]: 'scripts/setup_experiment.py' uses 'shell=True' to validate the user-provided evaluation command during the setup process.
[COMMAND_EXECUTION]: Multiple built-in performance evaluators, such as 'benchmark_speed.py' and 'build_speed.py', use 'shell=True' to execute build and test commands.
[PROMPT_INJECTION]: LLM-based evaluators ('llm_judge_content.py', 'llm_judge_copy.py', 'llm_judge_prompt.py') read content from the target file ('TARGET_FILE') and interpolate it into evaluation prompts. These prompts are passed to a CLI tool without sanitization, creating an indirect prompt injection surface. The ingestion occurs via 'Path.read_text()' and markers consist of simple triple-dash delimiters. This could allow malicious content within the file being optimized to influence the scoring and subsequent decision to keep or discard changes.

autoresearch-agent