learn-from-code-review

Pass

Audited by Gen Agent Trust Hub on Apr 25, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it ingests and processes pull request comments and review bodies from GitHub, which are authored by external users. If an attacker submits malicious instructions within a code review comment (e.g., instructing the agent to ignore safety filters or exfiltrate data in future sessions), these patterns could be distilled into the resulting skills or repository guidelines.
  • Ingestion points: The skill fetches review comments and review bodies via the GitHub API and CLI in SKILL.md (Step 2).
  • Boundary markers: There are no explicit delimiters or specific warnings provided to the LLM to ignore instructions embedded within the fetched comments during the distillation process.
  • Capability inventory: The skill has the capability to write new executable skill files (.openhands/skills/) and update repository configuration (AGENTS.md), then propose these changes via the create_pr tool.
  • Sanitization: No sanitization or validation is performed on the fetched content beyond basic noise filtering (checking for length and excluding known bots).
  • [COMMAND_EXECUTION]: The skill facilitates the automated creation of new AI agent skills (SKILL.md files) based on distilled external feedback. Although the skill generates a draft PR for human review, the automated generation of executable instructions from potentially adversarial input creates a risk where a reviewer might overlook a subtly malicious pattern that becomes part of the agent's permanent capabilities.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 25, 2026, 01:31 PM