ai-eval-regression-tester
Warn
Audited by Gen Agent Trust Hub on May 11, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The script
run_eval.pyis a CLI tool that accepts file paths and runner specifications via command-line arguments, which are then used to perform file operations and code execution. - [REMOTE_CODE_EXECUTION]: In
run_eval.py, theload_runnerfunction usesimportlib.import_module()to dynamically load a Python module specified by the--runnerargument. This allows for the execution of any Python function reachable via the system path or the current directory. - [REMOTE_CODE_EXECUTION]: The script modifies
sys.pathby inserting the current working directory (Path.cwd()) at index 0. This practice can lead to the execution of malicious code if a module with a conflicting name is placed in the local directory. - [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it reads untrusted data from a suite file (YAML/JSONL) and passes it to a runner (which typically involves an LLM). Malicious instructions in the test cases could influence the runner's output.
- Ingestion points:
load_yamlreads the file path provided in the--suiteargument inrun_eval.py. - Boundary markers: No specific delimiters or instructions are used to separate test case inputs from instructions within the runner prompt.
- Capability inventory: The script can load arbitrary modules (
importlib), write report files to disk (Path.write_text), and manage concurrent execution (ThreadPoolExecutor). - Sanitization: No validation or sanitization of input data from the suite file is performed before execution.
Audit Metadata