agent-platform-eval-flywheel
Pass
Audited by Gen Agent Trust Hub on Jun 25, 2026
Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
- Command Execution for Cloud Utilities: The skill utilizes local shell commands to facilitate integration with Google Cloud services.
- Evidence in
scripts/endpoint_evaluation.py: usessubprocess.runto executegcloud auth print-access-tokenfor obtaining authentication tokens andgsutil catto retrieve dataset content from Google Cloud Storage. - These operations are standard practices for cloud-native development tools and align with the skill's purpose of managing and evaluating cloud-hosted models.
- External Data Ingestion Surface: The skill is designed to process external evaluation datasets (JSONL, CSV, and session traces) for model grading.
- Ingestion points include
scripts/endpoint_evaluation.pyandscripts/maas_evaluation.py, which read data from remote storage, andscripts/parse_adk_traces.py, which parses local session files. - While processing untrusted data for LLM evaluation presents a surface for indirect prompt injection (where data might attempt to influence a judge model), this is a fundamental aspect of evaluation workflows. The skill utilizes official SDKs and standard JSON parsing to manage this data.
- Network Communication with Cloud APIs: The scripts perform network requests to interact with Vertex AI endpoints and Model-as-a-Service providers.
- Evidence in
scripts/endpoint_evaluation.py: utilizes therequestslibrary to send inference payloads to*.googleapis.comor user-defined dedicated endpoint DNS addresses. - These communications are essential for the skill's core functionality of running model inference and are directed toward official vendor service domains.
Audit Metadata