The Agent Skills Directory

[COMMAND_EXECUTION]: The script run_eval.py is a CLI tool that accepts file paths and runner specifications via command-line arguments, which are then used to perform file operations and code execution.
[REMOTE_CODE_EXECUTION]: In run_eval.py, the load_runner function uses importlib.import_module() to dynamically load a Python module specified by the --runner argument. This allows for the execution of any Python function reachable via the system path or the current directory.
[REMOTE_CODE_EXECUTION]: The script modifies sys.path by inserting the current working directory (Path.cwd()) at index 0. This practice can lead to the execution of malicious code if a module with a conflicting name is placed in the local directory.
[PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it reads untrusted data from a suite file (YAML/JSONL) and passes it to a runner (which typically involves an LLM). Malicious instructions in the test cases could influence the runner's output.
Ingestion points: load_yaml reads the file path provided in the --suite argument in run_eval.py.
Boundary markers: No specific delimiters or instructions are used to separate test case inputs from instructions within the runner prompt.
Capability inventory: The script can load arbitrary modules (importlib), write report files to disk (Path.write_text), and manage concurrent execution (ThreadPoolExecutor).
Sanitization: No validation or sanitization of input data from the suite file is performed before execution.

ai-eval-regression-tester