auto-review-loop
Warn
Audited by Gen Agent Trust Hub on May 17, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATION
Full Analysis
- [COMMAND_EXECUTION]: The skill requests and uses
Bash(*)permissions to autonomously run experiments. It specifically instructs the agent to 'Deploy to GPU server via SSH + screen/tmux' and execute arbitrary commands to run evaluations and collect results. It also uses shell commands to handle large file writes via heredocs (cat << 'EOF' > file). - [REMOTE_CODE_EXECUTION]: In its 'Nightmare' difficulty setting, the skill uses
codex execto allow an external model to independently read the repository and verify claims. This constitutes dynamic code execution where the content of the execution is determined by an external model's reasoning rather than fixed instructions. - [DATA_EXFILTRATION]: The skill attempts to access sensitive configuration data outside the project scope by checking for the existence of
~/.claude/feishu.json. While this is intended for notifications, it demonstrates capability to read credentials from the user's home directory. - [PROMPT_INJECTION]: The skill uses adversarial role-play instructions in its 'Hard' and 'Nightmare' modes, telling sub-agents to 'Actively look for things the author might be hiding' and 'Trust nothing the author tells you'. While used here as a quality control mechanism, this pattern of instructing models to bypass or doubt the main agent's context is a common injection technique.
- [INDIRECT_PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection due to its processing of untrusted data.
- Ingestion points: Reads experiment logs, results files (JSON/CSV), and external reviewer responses from the
mcp__codex__codextool. - Boundary markers: The skill lacks explicit sanitization or strict boundary markers when interpolating this external data into subsequent prompts for the next round of reviews or code implementation.
- Capability inventory: The skill possesses high-privilege capabilities including
Bash(*),Write,Edit, andAgentaccess, which could be exploited if malicious instructions are embedded in experiment logs or reviewer responses. - Sanitization: There is no evidence of escaping or validation performed on the external content before it is processed by the agent in Phase C.
Audit Metadata