arize-experiment

Pass

Audited by Gen Agent Trust Hub on May 5, 2026

Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: Orchestrates experiment management by executing the ax CLI and generating local Python scripts (infer.py) to automate model inference workflows.
  • [REMOTE_CODE_EXECUTION]: Facilitates the installation of official vendor tools and well-known SDKs, including arize-ax-cli, openai, anthropic, and google-genai, from standard public registries.
  • [PROMPT_INJECTION]: Contains a vulnerability surface for indirect prompt injection where untrusted content from dataset examples could influence the agent or the models being evaluated.
  • Ingestion points: Dataset examples retrieved via ax datasets export in SKILL.md.
  • Boundary markers: Not implemented in the generated Python inference script template.
  • Capability inventory: Python script execution with network access for API-based model evaluation.
  • Sanitization: No explicit validation or filtering of dataset input fields before interpolation into prompts.
  • [SAFE]: Adheres to security best practices for credential handling by instructing the agent to use environment variables and configuration profiles instead of hardcoding keys or searching the filesystem for .env files.
Audit Metadata
Risk Level
SAFE
Analyzed
May 5, 2026, 05:07 PM