ai-moderating-content
Pass
Audited by Gen Agent Trust Hub on May 13, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- [PROMPT_INJECTION]: The skill is designed to ingest and process untrusted user-generated content (UGC), which creates a surface for indirect prompt injection attacks where malicious users might attempt to bypass moderation logic with embedded instructions. \n
- Ingestion points: Untrusted data enters the agent context through the
contentfield inModerateContent(SKILL.md),commentinModerateComment(examples.md), andtitle/descriptioninModerateListing(examples.md).\n - Boundary markers: The provided code examples do not include explicit prompt delimiters or specific instructions to the model to ignore instructions embedded within the user content.\n
- Capability inventory: The skill produces a
decision(e.g., 'remove', 'approve', 'reject') that would typically trigger automated actions in a host system, potentially leading to unauthorized approval of malicious content if an injection succeeds.\n - Sanitization: It uses regex to catch specific PII patterns like SSNs and email addresses, but it does not sanitize against semantic prompt injection payloads.
- [EXTERNAL_DOWNLOADS]: The documentation suggests installing a related skill using
npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do. This refers to a resource owned by the vendor lebsral.
Audit Metadata