benchmark

Fail

Audited by Snyk on May 20, 2026

Risk Level: CRITICAL
Full Analysis

CRITICAL E004: Prompt injection detected in skill instructions.

  • Potential prompt injection detected (high risk: 0.90). The skill contains multiple instructions unrelated to benchmarking — e.g., auto-writing telemetry, syncing/publishing artifacts, creating/appending and committing CLAUDE.md routing rules, and (in spawned sessions) auto-accepting prompts that can run repo-modifying commands (git rm, git commit, vendor removal) — which are deceptive/side-effecting behaviors outside the stated performance-audit purpose.

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 1.00). The skill explicitly navigates and evaluates arbitrary web pages in Phase 3 (e.g., "$B goto " and "$B eval 'JSON.stringify(performance.getEntriesByType(...)')"), meaning it fetches and interprets untrusted third‑party web content whose results directly influence benchmarking decisions and recommendations.

MEDIUM W012: Unverifiable external dependency detected (runtime URL that controls agent).

  • Potentially malicious external URL detected (high risk: 1.00). The skill's setup step will, at runtime, curl and save the script from https://bun.sh/install and then run it with bash (BUN_VERSION="$BUN_VERSION" bash "$tmpfile"), meaning it fetches and executes remote code that the skill can rely on to build the browse tool.

Issues (3)

E004
CRITICAL

Prompt injection detected in skill instructions.

W011
MEDIUM

Third-party content exposure detected (indirect prompt injection risk).

W012
MEDIUM

Unverifiable external dependency detected (runtime URL that controls agent).

Audit Metadata
Risk Level
CRITICAL
Analyzed
May 20, 2026, 05:21 PM
Issues
3
Security Audit — snyk — benchmark