ai-safety-researcher

Warn

Audited by Snyk on Apr 21, 2026

Risk Level: MEDIUM
Full Analysis

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 0.70). The skill's red-team workflow (see §9.2 "Red-Team Evaluation — Jailbreak Attack Suite Design" in SKILL.md / references/9-2...) explicitly instructs sourcing base prompts from HarmBench and augmenting with domain-specific/public prompts, meaning the agent ingests untrusted third-party prompt content that will be read and used to drive evaluations and defenses, which can enable indirect prompt injection.

Issues (1)

W011
MEDIUM

Third-party content exposure detected (indirect prompt injection risk).

Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 21, 2026, 11:43 AM
Issues
1