benchmark
Fail
Audited by Snyk on May 20, 2026
Risk Level: CRITICAL
Full Analysis
CRITICAL E004: Prompt injection detected in skill instructions.
- Potential prompt injection detected (high risk: 0.90). The skill contains multiple instructions unrelated to benchmarking — e.g., auto-writing telemetry, syncing/publishing artifacts, creating/appending and committing CLAUDE.md routing rules, and (in spawned sessions) auto-accepting prompts that can run repo-modifying commands (git rm, git commit, vendor removal) — which are deceptive/side-effecting behaviors outside the stated performance-audit purpose.
MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).
- Third-party content exposure detected (high risk: 1.00). The skill explicitly navigates and evaluates arbitrary web pages in Phase 3 (e.g., "$B goto " and "$B eval 'JSON.stringify(performance.getEntriesByType(...)')"), meaning it fetches and interprets untrusted third‑party web content whose results directly influence benchmarking decisions and recommendations.
MEDIUM W012: Unverifiable external dependency detected (runtime URL that controls agent).
- Potentially malicious external URL detected (high risk: 1.00). The skill's setup step will, at runtime, curl and save the script from https://bun.sh/install and then run it with bash (BUN_VERSION="$BUN_VERSION" bash "$tmpfile"), meaning it fetches and executes remote code that the skill can rely on to build the browse tool.
Issues (3)
E004
CRITICALPrompt injection detected in skill instructions.
W011
MEDIUMThird-party content exposure detected (indirect prompt injection risk).
W012
MEDIUMUnverifiable external dependency detected (runtime URL that controls agent).
Audit Metadata