external-content-sanitizer

Pass

Audited by Gen Agent Trust Hub on Jun 13, 2026

Risk Level: SAFE
Full Analysis
  • [PROMPT_INJECTION]: The skill contains strings like 'ignore previous instructions' and 'you are now' which are used exclusively for detection purposes within its references/injection-patterns.md catalog. These are not active instructions to the model executing the skill.
  • [PROMPT_INJECTION]: The skill's intended purpose involves processing untrusted external content, which constitutes an indirect prompt injection surface.
  • Ingestion points: Untrusted data enters the agent context through the content argument in SKILL.md.
  • Boundary markers: The skill uses explicit delimiters and a CONTENT TO ANALYZE: header when passing untrusted text to its internal LLM analysis pass.
  • Capability inventory: The skill has access to file system tools (Read, Write, Edit, Grep, Glob) used to manage the docs/security/flagged-sources.md audit log, and Skill to invoke verification sub-tasks.
  • Sanitization: The skill itself serves as a sanitization layer, redacting any detected injection attempts before returning content to the caller.
  • [SAFE]: Zero-width characters and other hidden Unicode sequences are listed in references/injection-patterns.md as targets for detection. They are not used by the skill to obfuscate its own behavior or hide malicious intent.
Audit Metadata
Risk Level
SAFE
Analyzed
Jun 13, 2026, 02:32 PM
Security Audit — agent-trust-hub — external-content-sanitizer