evaluating-llms-harness

Warn

Audited by Snyk on May 16, 2026

Risk Level: MEDIUM
Full Analysis

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 0.90). The skill's workflows (see references/api-evaluation.md and references/custom-tasks.md in SKILL.md) explicitly instruct the harness to ingest public API responses (OpenAI/Anthropic/local base_url) and public datasets (HuggingFace/task dataset_path, GitHub links, leaderboard URLs) as part of runtime evaluation, exposing the agent to untrusted, user-generated third-party content that can influence model prompts and evaluation behavior.

MEDIUM W012: Unverifiable external dependency detected (runtime URL that controls agent).

  • Potentially malicious external URL detected (high risk: 0.90). The skill includes a runtime docker command that pulls and runs remote code from ghcr.io/huggingface/text-generation-inference:latest (docker run ... ghcr.io/huggingface/text-generation-inference:latest), which fetches and executes a remote container image used as a required dependency for the Text Generation Inference workflow.

Issues (2)

W011
MEDIUM

Third-party content exposure detected (indirect prompt injection risk).

W012
MEDIUM

Unverifiable external dependency detected (runtime URL that controls agent).

Audit Metadata
Risk Level
MEDIUM
Analyzed
May 16, 2026, 01:45 PM
Issues
2
Security Audit — snyk — evaluating-llms-harness