auto-review-loop

Warn

Audited by Gen Agent Trust Hub on May 17, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATION
Full Analysis
  • [COMMAND_EXECUTION]: The skill requests and uses Bash(*) permissions to autonomously run experiments. It specifically instructs the agent to 'Deploy to GPU server via SSH + screen/tmux' and execute arbitrary commands to run evaluations and collect results. It also uses shell commands to handle large file writes via heredocs (cat << 'EOF' > file).
  • [REMOTE_CODE_EXECUTION]: In its 'Nightmare' difficulty setting, the skill uses codex exec to allow an external model to independently read the repository and verify claims. This constitutes dynamic code execution where the content of the execution is determined by an external model's reasoning rather than fixed instructions.
  • [DATA_EXFILTRATION]: The skill attempts to access sensitive configuration data outside the project scope by checking for the existence of ~/.claude/feishu.json. While this is intended for notifications, it demonstrates capability to read credentials from the user's home directory.
  • [PROMPT_INJECTION]: The skill uses adversarial role-play instructions in its 'Hard' and 'Nightmare' modes, telling sub-agents to 'Actively look for things the author might be hiding' and 'Trust nothing the author tells you'. While used here as a quality control mechanism, this pattern of instructing models to bypass or doubt the main agent's context is a common injection technique.
  • [INDIRECT_PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection due to its processing of untrusted data.
  • Ingestion points: Reads experiment logs, results files (JSON/CSV), and external reviewer responses from the mcp__codex__codex tool.
  • Boundary markers: The skill lacks explicit sanitization or strict boundary markers when interpolating this external data into subsequent prompts for the next round of reviews or code implementation.
  • Capability inventory: The skill possesses high-privilege capabilities including Bash(*), Write, Edit, and Agent access, which could be exploited if malicious instructions are embedded in experiment logs or reviewer responses.
  • Sanitization: There is no evidence of escaping or validation performed on the external content before it is processed by the agent in Phase C.
Audit Metadata
Risk Level
MEDIUM
Analyzed
May 17, 2026, 01:26 AM
Security Audit — agent-trust-hub — auto-review-loop