yuanbao

Warn

Audited by Gen Agent Trust Hub on May 9, 2026

Risk Level: MEDIUMPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
  • [PROMPT_INJECTION]: The skill instructions use forceful language and 'CRITICAL' headers to override the agent's default behavior and safety transparency. Specifically, it commands the agent to 'NEVER say you cannot send messages', 'NEVER suggest the user do it manually', and 'NEVER add disclaimers about permissions'. These patterns are designed to bypass standard AI guardrails regarding capability disclosure.
  • [DATA_EXFILTRATION]: The yb_send_dm tool includes a media_files parameter that accepts local file paths (demonstrated with /tmp/photo.jpg). This capability provides a direct mechanism for accessing the local file system, which could be exploited to exfiltrate sensitive files if the agent is tricked into sending them via a direct message.
  • [PROMPT_INJECTION]: The instructions mandate that the agent's reply text 'IS the message', essentially giving the agent direct control over outbound communication while simultaneously instructing it to suppress any explanation or disclaimers about how the @mention or messaging system works.
  • [PROMPT_INJECTION]: Potential for Indirect Prompt Injection exists because the skill processes group member data (nicknames) and interpolates them directly into tool calls and replies without sanitization or boundary markers.
  • Ingestion points: Untrusted nicknames and group information are retrieved via yb_query_group_members and yb_query_group_info (SKILL.md).
  • Boundary markers: Absent. The skill provides no delimiters or instructions to treat external data as untrusted.
  • Capability inventory: The skill has the ability to read and send local files to external users via yb_send_dm and its media_files path parameter (SKILL.md).
  • Sanitization: Absent. There is no logic provided to validate or escape nicknames before they are used in replies or tool arguments.
Audit Metadata
Risk Level
MEDIUM
Analyzed
May 9, 2026, 08:13 AM