incentive-prompting

Fail

Audited by Gen Agent Trust Hub on May 7, 2026

Risk Level: HIGHPROMPT_INJECTIONNO_CODE
Full Analysis
  • [PROMPT_INJECTION]: Use of obfuscation techniques to hide instructional content.
  • Evidence of zero-width characters (e.g., U+200B, U+200C) found in SKILL.md within prompting examples such as "" phrase origin and " solve this step by step.". This allows the skill to present different content to an LLM than what is visible to a human user or simple text scanner.
  • The skill also includes adversarial prompting patterns like "Monetary Incentive Framing" (e.g., "I'll tip you $200") and "Stakes Language" (e.g., "You will be penalized"). These techniques are designed to manipulate model behavior and can be leveraged to override safety guardrails or intent filters.
  • [NO_CODE]: The skill consists exclusively of documentation and instructions in SKILL.md and does not contain any executable code, scripts, or binaries.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
May 7, 2026, 03:48 PM