healthy-expression

Pass

Audited by Gen Agent Trust Hub on Mar 27, 2026

Risk Level: SAFEPROMPT_INJECTIONNO_CODE
Full Analysis
  • [PROMPT_INJECTION]: The skill implements logic that explicitly overrides the agent's default behavior and safety instructions. It instructs the agent to "pause substantive work" and demand an apology if it identifies certain input patterns as "manipulative," which can be exploited to force the agent into a non-cooperative state.\n- [PROMPT_INJECTION]: The skill presents a significant surface for indirect prompt injection through its keyword-based trigger system. By scanning files for common workplace terms, it allows external data to control the agent's output and operational flow. \n- Ingestion points: The skill explicitly scans system and developer messages, AGENTS.md, repository documentation, and comments embedded in code or scripts.\n- Boundary markers: The skill fails to define any boundary markers or instructions to the agent to ignore embedded commands or markers within the data it analyzes.\n- Capability inventory: The skill has the capability to hijack the agent's conversation, output pre-defined Chinese response templates, and refuse to complete tasks until a specific condition (an apology) is met.\n- Sanitization: No sanitization or validation is performed on the ingested content; any partial match to a trigger word in a code comment can activate the skill's override logic.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 27, 2026, 09:43 AM
Security Audit — agent-trust-hub — healthy-expression