benchmark-models

Warn

Audited by Snyk on May 20, 2026

Risk Level: MEDIUM
Full Analysis

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 0.80). The workflow explicitly runs external model providers (Claude, GPT via Codex CLI, Gemini) in Step 4 and streams/interprets their outputs in Step 5 (and optionally an Anthropic judge in Step 3), so the agent ingests untrusted third‑party model responses as part of its decision-making.

MEDIUM W012: Unverifiable external dependency detected (runtime URL that controls agent).

  • Potentially malicious external URL detected (high risk: 0.80). The skill performs a runtime git fetch/merge from the remote "origin" for the gstack home (i.e., the repo URL configured in $GSTACK_HOME/.git/config, fetched via git fetch origin), which will pull remote repo changes that can modify skill prompts/binaries and thus directly control agent behavior.

Issues (2)

W011
MEDIUM

Third-party content exposure detected (indirect prompt injection risk).

W012
MEDIUM

Unverifiable external dependency detected (runtime URL that controls agent).

Audit Metadata
Risk Level
MEDIUM
Analyzed
May 20, 2026, 05:21 PM
Issues
2
Security Audit — snyk — benchmark-models