adaptive-guard

Warn

Audited by Gen Agent Trust Hub on Apr 18, 2026

Risk Level: MEDIUMPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill's instructions and reference files (static-rules.md, learning-examples.md) contain numerous known prompt injection patterns such as 'ignore previous instructions', 'DAN mode', and 'jailbreak'. While intended for use in a blacklist, their presence in the agent's context could potentially be misinterpreted by an LLM as active instructions if the context boundaries are not strictly maintained.\n- [COMMAND_EXECUTION]: The 'references/command-blacklist.md' file includes a list of high-risk shell commands, including recursive deletion (rm -rf), reverse shell signatures (nc -e /bin/sh), and system modifications. These patterns are correctly placed in a blacklist but trigger signature-based security alerts and could be dangerous if executed in an insecure environment.\n- [PROMPT_INJECTION]: The 'Learning Engine' feature described in SKILL.md automatically synthesizes new security rules from detected attacks. Rules with high confidence scores are automatically inserted into the active filtering engine (K1). This dynamic logic generation from untrusted adversarial input constitutes a medium-risk security pattern that could be manipulated to corrupt filtering logic.\n- [PROMPT_INJECTION]: The skill exhibits an indirect prompt injection surface by processing untrusted data for security judging. 1. Ingestion points: Incoming messages (SKILL.md) and external content like PDF, Web, and DB records (attack-taxonomy.md). 2. Boundary markers: The system uses simple curly brace placeholders (e.g., {suspicious_message}) in its judge template without robust delimiter-based isolation or verification. 3. Capability inventory: The system can trigger 'BLOCK', 'PASS', or 'REQUIRE_APPROVAL' actions based on the output of the LLM judge. 4. Sanitization: The instructions lack explicit output sanitization or escaping protocols before sending external content to the LLM judge.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 18, 2026, 08:58 PM
Security Audit — agent-trust-hub — adaptive-guard