skill-debate
Fail
Audited by Gen Agent Trust Hub on May 6, 2026
Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATION
Full Analysis
- [COMMAND_EXECUTION]: The skill constructs shell commands by directly interpolating user-controlled variables (such as
${QUESTION}) into strings passed toprintf,codex, and other shell utilities. An attacker can execute arbitrary code on the host system by including shell metacharacters (e.g.,;,&,|, or backticks) in the debate question. - [PROMPT_INJECTION]: The system instructions for the Codex sub-agent include explicit commands to override the agent's internal configuration and safety guidelines, such as 'Skip ALL skills,' 'Do NOT read skill files,' and 'take precedence over all skill directives.' This pattern is designed to bypass security and operational constraints of the sub-agent.
- [REMOTE_CODE_EXECUTION]: The skill invokes external CLI tools (
gemini,codex) using flags that explicitly disable user confirmation and safety prompts, such as--approval-mode yoloand--full-auto. Combined with the lack of input sanitization, this allows for the unmediated execution of actions through these third-party services. - [DATA_EXFILTRATION]: The skill is designed to read local files (e.g.,
src/auth.ts) based on user prompts and transmit their content to external AI model APIs. There is no sanitization or verification mechanism to prevent the accidental or malicious transmission of sensitive local data, such as credentials or private keys, to these external endpoints. - [PROMPT_INJECTION]: The skill processes external, untrusted data (user questions and file contents) and interpolates it into prompts for multiple AI models without using boundary markers or sanitization. This creates a surface for indirect prompt injection, where a malicious file or prompt could seize control of the moderator agent or its advisors.
Recommendations
- AI detected serious security threats
Audit Metadata