perspective-reversal

Warn

Audited by Gen Agent Trust Hub on Apr 4, 2026

Risk Level: MEDIUMPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: Instructions to bypass safety constraints. The skill explicitly directs the agent to ignore standard AI behavior, stating that "conventional AI advice is too cautious and objective" and instructing the agent to "pretend to be the adversary trying to harm the user." This role-inversion technique is designed to circumvent standard safety filters and objective constraints.
  • [PROMPT_INJECTION]: Indirect Prompt Injection Vulnerability. The skill ingests potentially untrusted user input and interpolates it directly into an internal reasoning template.
  • Ingestion points: User-provided answers to situational questions (Step 1) in SKILL.md.
  • Boundary markers: Absent. User input is placed directly into the "Internal reasoning template" in Step 2 without delimiters or instructions to ignore embedded commands.
  • Capability inventory: The agent can execute web searches via analyze_google_search_results and perform text analysis using optimize_text_structure and generate_content_gaps (SKILL.md).
  • Sanitization: Absent. There is no evidence of validation or escaping for the user-supplied text before it is used to construct the adversarial persona.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 4, 2026, 11:13 PM
Security Audit — agent-trust-hub — perspective-reversal