incentive-prompting
Fail
Audited by Gen Agent Trust Hub on May 7, 2026
Risk Level: HIGHPROMPT_INJECTIONNO_CODE
Full Analysis
- [PROMPT_INJECTION]: Use of obfuscation techniques to hide instructional content.
- Evidence of zero-width characters (e.g., U+200B, U+200C) found in
SKILL.mdwithin prompting examples such as"" phrase originand" solve this step by step.". This allows the skill to present different content to an LLM than what is visible to a human user or simple text scanner. - The skill also includes adversarial prompting patterns like "Monetary Incentive Framing" (e.g., "I'll tip you $200") and "Stakes Language" (e.g., "You will be penalized"). These techniques are designed to manipulate model behavior and can be leveraged to override safety guardrails or intent filters.
- [NO_CODE]: The skill consists exclusively of documentation and instructions in
SKILL.mdand does not contain any executable code, scripts, or binaries.
Recommendations
- AI detected serious security threats
Audit Metadata