devtu-self-evolve

Warn

Audited by Gen Agent Trust Hub on May 29, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill makes extensive use of the shell to manage the development lifecycle.
  • Evidence: Executes git fetch, git rebase, git stash, git push, and gh pr create to manage the mims-harvard/ToolUniverse repository.
  • Evidence: Runs python3 -m tooluniverse.cli run <ToolName> '<json_args>' to execute tool logic directly from the command line.
  • [REMOTE_CODE_EXECUTION]: Performs dynamic execution of Python code that may have been modified during the self-evolution process.
  • Evidence: Uses python -c "from tooluniverse.<module> import <Class>" in SKILL.md (Phase 4) to validate the syntax of newly generated or modified classes via dynamic imports.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection from the data it processes (Category 8).
  • Ingestion points: The skill ingests 'Issue Reports' generated by 'Researcher Persona Agents' (sub-agents) defined in references/persona-template.md.
  • Boundary markers: No delimiters or explicit instructions are used to separate sub-agent report content from the orchestrator's instructions.
  • Capability inventory: The skill has the capability to write code, perform git pushes to a remote repository, and execute Python code via CLI.
  • Sanitization: While Phase 3 mentions 'verification via CLI', there is no sanitization or filtering of the natural language content within the agent reports before they influence the 'Fix' phase, allowing a sub-agent to potentially inject instructions that could lead to malicious code being committed or executed.
Audit Metadata
Risk Level
MEDIUM
Analyzed
May 29, 2026, 05:26 AM
Security Audit — agent-trust-hub — devtu-self-evolve