The Agent Skills Directory

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

Third-party content exposure detected (high risk: 1.00). The runner explicitly loads public, user-contributed datasets from HuggingFace (see load_benchmark in scripts/run_eval.py which calls datasets.load_dataset with HuggingFace IDs and SKILL.md Step 3 / references/benchmark-catalog.md that list HF dataset URLs), so untrusted third‑party content is ingested and directly used to drive prompts, scoring, and model-selection decisions.

MEDIUM W012: Unverifiable external dependency detected (runtime URL that controls agent).

Potentially malicious external URL detected (high risk: 0.90). The runner calls datasets.load_dataset with HuggingFace dataset IDs (e.g. https://huggingface.co/datasets/pig4431/HeQ_v1, https://huggingface.co/datasets/HebArabNlpProject/HebrewSentiment, https://huggingface.co/datasets/HebArabNlpProject/HebNLI) which are fetched at runtime and their content is injected verbatim into the model prompts, so external content directly controls prompts and is a required dependency.

hebrew-llm-eval-suite