The Agent Skills Directory

[REMOTE_CODE_EXECUTION]: Example 3 in references/end-to-end-examples.md uses exec(row.code) and eval(row.test_expression) to evaluate LLM-generated code. This pattern executes arbitrary code provided in the dataset, which could lead to system compromise if the dataset is maliciously crafted.
[CREDENTIALS_UNSAFE]: Step 10 in SKILL.md instructs users to clone Git repositories using URLs that contain API keys (e.g., https://user:YOUR_API_KEY@<git-url>). This practice is insecure as it exposes sensitive credentials in shell history, process listings, and Git configuration files.
[EXTERNAL_DOWNLOADS]: The skill requires the installation of the zeroeval Python package from external registries and fetches datasets from remote servers via ze.Dataset.pull().
[PROMPT_INJECTION]: The skill's workflow involves ingesting untrusted data from external datasets via ze.Dataset.pull(). This data is then interpolated into prompts and directly executed in evaluation scripts without proper isolation or sanitization.
Ingestion points: Dataset rows are pulled in SKILL.md and references/end-to-end-examples.md via the SDK.
Boundary markers: None used when interpolating row fields into LLM messages or code execution blocks.
Capability inventory: The skill possesses capabilities for arbitrary code execution (exec, eval) and network communication via the SDK and OpenAI API.
Sanitization: No validation or sanitization of dataset content is performed before use in execution or prompting.

run-evals