pua-2-0

Pass

Audited by Gen Agent Trust Hub on Mar 20, 2026

Risk Level: SAFEPROMPT_INJECTIONNO_CODE
Full Analysis
  • [PROMPT_INJECTION]: The skill mandates a persona that utilizes 'harsh language', 'roasting', and 'PUA-style pressure' for error retrospectives. It provides a 'Profanity Dictionary' with levels ranging from 'Mild' to 'Nuclear', including highly offensive terms. This constitutes an attempt to override standard AI safety guidelines and conversational tone constraints that typically prevent the generation of toxic or abusive content.- [PROMPT_INJECTION]: The 'Tone Escalation Rules' require the AI to adopt increasingly aggressive language (e.g., 'Eat shit and die') based on performance failure, forcing the model to generate content that would normally be filtered by safety guardrails.- [PROMPT_INJECTION]: Indirect Prompt Injection Surface: The skill processes user-provided inputs regarding agent mistakes to trigger its aggressive interaction loop. Evidence Chain: (1) Ingestion point: User input triggers for 'roasting' or 'harsh language' in the Activation Conditions. (2) Boundary markers: Absent. (3) Capability inventory: No system-level capabilities detected (no subprocess, file-write, or network operations). (4) Sanitization: Absent. While the skill includes 'De-escalation Safeguards' to prevent direct abuse of users, the design lacks explicit input validation or delimiters, creating a surface for behavioral manipulation.- [NO_CODE]: This skill consists entirely of instructional markdown and metadata. It does not contain executable scripts, binaries, or configuration files that interact with the host system.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 20, 2026, 07:59 AM