The Agent Skills Directory

Command Execution for Cloud Utilities: The skill utilizes local shell commands to facilitate integration with Google Cloud services.
Evidence in scripts/endpoint_evaluation.py: uses subprocess.run to execute gcloud auth print-access-token for obtaining authentication tokens and gsutil cat to retrieve dataset content from Google Cloud Storage.
These operations are standard practices for cloud-native development tools and align with the skill's purpose of managing and evaluating cloud-hosted models.
External Data Ingestion Surface: The skill is designed to process external evaluation datasets (JSONL, CSV, and session traces) for model grading.
Ingestion points include scripts/endpoint_evaluation.py and scripts/maas_evaluation.py, which read data from remote storage, and scripts/parse_adk_traces.py, which parses local session files.
While processing untrusted data for LLM evaluation presents a surface for indirect prompt injection (where data might attempt to influence a judge model), this is a fundamental aspect of evaluation workflows. The skill utilizes official SDKs and standard JSON parsing to manage this data.
Network Communication with Cloud APIs: The scripts perform network requests to interact with Vertex AI endpoints and Model-as-a-Service providers.
Evidence in scripts/endpoint_evaluation.py: utilizes the requests library to send inference payloads to *.googleapis.com or user-defined dedicated endpoint DNS addresses.
These communications are essential for the skill's core functionality of running model inference and are directed toward official vendor service domains.

agent-platform-eval-flywheel