run-evals
Warn
Audited by Gen Agent Trust Hub on Apr 6, 2026
Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCREDENTIALS_UNSAFEEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [REMOTE_CODE_EXECUTION]: Example 3 in
references/end-to-end-examples.mdusesexec(row.code)andeval(row.test_expression)to evaluate LLM-generated code. This pattern executes arbitrary code provided in the dataset, which could lead to system compromise if the dataset is maliciously crafted. - [CREDENTIALS_UNSAFE]: Step 10 in
SKILL.mdinstructs users to clone Git repositories using URLs that contain API keys (e.g.,https://user:YOUR_API_KEY@<git-url>). This practice is insecure as it exposes sensitive credentials in shell history, process listings, and Git configuration files. - [EXTERNAL_DOWNLOADS]: The skill requires the installation of the
zeroevalPython package from external registries and fetches datasets from remote servers viaze.Dataset.pull(). - [PROMPT_INJECTION]: The skill's workflow involves ingesting untrusted data from external datasets via
ze.Dataset.pull(). This data is then interpolated into prompts and directly executed in evaluation scripts without proper isolation or sanitization. - Ingestion points: Dataset rows are pulled in
SKILL.mdandreferences/end-to-end-examples.mdvia the SDK. - Boundary markers: None used when interpolating
rowfields into LLM messages or code execution blocks. - Capability inventory: The skill possesses capabilities for arbitrary code execution (
exec,eval) and network communication via the SDK and OpenAI API. - Sanitization: No validation or sanitization of dataset content is performed before use in execution or prompting.
Audit Metadata