evaluating-llms-harness
Warn
Audited by Snyk on May 16, 2026
Risk Level: MEDIUM
Full Analysis
MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).
- Third-party content exposure detected (high risk: 0.90). The skill's workflows (see references/api-evaluation.md and references/custom-tasks.md in SKILL.md) explicitly instruct the harness to ingest public API responses (OpenAI/Anthropic/local base_url) and public datasets (HuggingFace/task dataset_path, GitHub links, leaderboard URLs) as part of runtime evaluation, exposing the agent to untrusted, user-generated third-party content that can influence model prompts and evaluation behavior.
MEDIUM W012: Unverifiable external dependency detected (runtime URL that controls agent).
- Potentially malicious external URL detected (high risk: 0.90). The skill includes a runtime docker command that pulls and runs remote code from ghcr.io/huggingface/text-generation-inference:latest (docker run ... ghcr.io/huggingface/text-generation-inference:latest), which fetches and executes a remote container image used as a required dependency for the Text Generation Inference workflow.
Issues (2)
W011
MEDIUMThird-party content exposure detected (indirect prompt injection risk).
W012
MEDIUMUnverifiable external dependency detected (runtime URL that controls agent).
Audit Metadata