ai-eval-regression-tester

Warn

Audited by Gen Agent Trust Hub on May 11, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The script run_eval.py is a CLI tool that accepts file paths and runner specifications via command-line arguments, which are then used to perform file operations and code execution.
  • [REMOTE_CODE_EXECUTION]: In run_eval.py, the load_runner function uses importlib.import_module() to dynamically load a Python module specified by the --runner argument. This allows for the execution of any Python function reachable via the system path or the current directory.
  • [REMOTE_CODE_EXECUTION]: The script modifies sys.path by inserting the current working directory (Path.cwd()) at index 0. This practice can lead to the execution of malicious code if a module with a conflicting name is placed in the local directory.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it reads untrusted data from a suite file (YAML/JSONL) and passes it to a runner (which typically involves an LLM). Malicious instructions in the test cases could influence the runner's output.
  • Ingestion points: load_yaml reads the file path provided in the --suite argument in run_eval.py.
  • Boundary markers: No specific delimiters or instructions are used to separate test case inputs from instructions within the runner prompt.
  • Capability inventory: The script can load arbitrary modules (importlib), write report files to disk (Path.write_text), and manage concurrent execution (ThreadPoolExecutor).
  • Sanitization: No validation or sanitization of input data from the suite file is performed before execution.
Audit Metadata
Risk Level
MEDIUM
Analyzed
May 11, 2026, 04:06 PM
Security Audit — agent-trust-hub — ai-eval-regression-tester