adversarial-review

Fail

Audited by Gen Agent Trust Hub on May 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill constructs and executes shell commands (codex exec and claude -p) that interpolate the 'prompt' variable—which contains untrusted code diffs and instructions—directly into the command line. Because shell metacharacters in the code content (such as backticks, dollar signs, or quotes) are not escaped, an attacker can achieve arbitrary command execution on the host system.
  • [DATA_EXFILTRATION]: Due to the command injection vulnerability in the shell scripts, a malicious diff being reviewed could execute commands to read and transmit sensitive local files (such as SSH keys, AWS credentials, or .env files) to an attacker-controlled external server.
  • [PROMPT_INJECTION]:
  • Ingestion points: The skill ingests untrusted data from the local repository, specifically recent code diffs and project planning files.
  • Boundary markers: No delimiters or boundary markers are used to separate the untrusted code content from the reviewer's instructions, making the system susceptible to indirect prompt injection where the code itself can steer the reviewer's behavior.
  • Capability inventory: The skill possesses shell execution capabilities through the system's CLI tools.
  • Sanitization: There is no evidence of sanitization or validation performed on the ingested code before it is passed to the shell or the secondary AI model.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
May 17, 2026, 02:07 AM
Security Audit — agent-trust-hub — adversarial-review