agent-evaluation
Warn
Audited by Gen Agent Trust Hub on Apr 23, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: Several utility scripts execute shell commands via
subprocess.runto interact with system tools and cloud CLIs. \n scripts/setup_mlflow.pyandscripts/utils/env_validation.pyrun thedatabricks auth profilescommand. \nscripts/validate_environment.pyexecutes themlflow doctordiagnostic tool. \n- [COMMAND_EXECUTION]: The skill performs dynamic code execution and script generation. \nscripts/create_dataset_template.pyandscripts/run_evaluation_template.pyusesubprocess.run(['python', '-c', ...])to run dynamically constructed Python snippets for metadata retrieval. \n- The skill generates new executable Python files (
create_evaluation_dataset.pyandrun_agent_evaluation.py) from internal templates. \n scripts/validate_tracing_runtime.pyusesimportlib.import_moduleto dynamically load agent code based on user-provided module names. \n- [COMMAND_EXECUTION]: The skill modifies file permissions on dynamically created scripts. \nscripts/create_dataset_template.pyandscripts/run_evaluation_template.pycallos.chmodto grant execution privileges (0o755) to the generated scripts. \n- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it processes external data for agent evaluation without adequate safeguards. \n- Ingestion points: Untrusted data from MLflow datasets is loaded into DataFrames and passed directly to agent entry points in the
run_agent_evaluation.pyscript. \n - Boundary markers: There are no boundary markers or instructions used to prevent the agent from executing instructions embedded within the evaluation dataset. \n
- Capability inventory: The evaluation environment allows for subprocess execution, file system access, and network communication via MLflow and LLM provider APIs. \n
- Sanitization: The skill lacks mechanisms to sanitize or validate the content of the evaluation datasets before they are processed.
Audit Metadata