The Agent Skills Directory

[PROMPT_INJECTION]: The skill is designed to ingest and process untrusted user-generated content (UGC), which creates a surface for indirect prompt injection attacks where malicious users might attempt to bypass moderation logic with embedded instructions. \n
Ingestion points: Untrusted data enters the agent context through the content field in ModerateContent (SKILL.md), comment in ModerateComment (examples.md), and title/description in ModerateListing (examples.md).\n
Boundary markers: The provided code examples do not include explicit prompt delimiters or specific instructions to the model to ignore instructions embedded within the user content.\n
Capability inventory: The skill produces a decision (e.g., 'remove', 'approve', 'reject') that would typically trigger automated actions in a host system, potentially leading to unauthorized approval of malicious content if an injection succeeds.\n
Sanitization: It uses regex to catch specific PII patterns like SSNs and email addresses, but it does not sanitize against semantic prompt injection payloads.
[EXTERNAL_DOWNLOADS]: The documentation suggests installing a related skill using npx skills add lebsral/DSPy-Programming-not-prompting-LMs-skills --skill ai-do. This refers to a resource owned by the vendor lebsral.

ai-moderating-content