agent-platform-eval-flywheel

Pass

Audited by Gen Agent Trust Hub on Jun 25, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • Command Execution for Cloud Utilities: The skill utilizes local shell commands to facilitate integration with Google Cloud services.
  • Evidence in scripts/endpoint_evaluation.py: uses subprocess.run to execute gcloud auth print-access-token for obtaining authentication tokens and gsutil cat to retrieve dataset content from Google Cloud Storage.
  • These operations are standard practices for cloud-native development tools and align with the skill's purpose of managing and evaluating cloud-hosted models.
  • External Data Ingestion Surface: The skill is designed to process external evaluation datasets (JSONL, CSV, and session traces) for model grading.
  • Ingestion points include scripts/endpoint_evaluation.py and scripts/maas_evaluation.py, which read data from remote storage, and scripts/parse_adk_traces.py, which parses local session files.
  • While processing untrusted data for LLM evaluation presents a surface for indirect prompt injection (where data might attempt to influence a judge model), this is a fundamental aspect of evaluation workflows. The skill utilizes official SDKs and standard JSON parsing to manage this data.
  • Network Communication with Cloud APIs: The scripts perform network requests to interact with Vertex AI endpoints and Model-as-a-Service providers.
  • Evidence in scripts/endpoint_evaluation.py: utilizes the requests library to send inference payloads to *.googleapis.com or user-defined dedicated endpoint DNS addresses.
  • These communications are essential for the skill's core functionality of running model inference and are directed toward official vendor service domains.
Audit Metadata
Risk Level
SAFE
Analyzed
Jun 25, 2026, 01:27 AM
Security Audit — agent-trust-hub — agent-platform-eval-flywheel