ai-research-reproduction

Pass

Audited by Gen Agent Trust Hub on May 18, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The orchestration script scripts/orchestrate_repro.py uses subprocess.run to execute shell commands extracted from third-party README files.
  • Evidence: The maybe_run_command function in scripts/orchestrate_repro.py invokes subprocess.run on command strings parsed from untrusted external documentation.
  • [PROMPT_INJECTION]: The skill treats arbitrary documentation from external repositories as a trusted source of executable instructions, leading to a risk of indirect prompt injection.
  • Ingestion points: Content from the target repository's README.md is ingested and parsed by external scripts called from scripts/orchestrate_repro.py.
  • Boundary markers: The skill does not implement delimiters or 'ignore' instructions when processing commands extracted from the README, meaning malicious instructions in the text could influence agent behavior.
  • Capability inventory: The skill has access to the shell and can execute arbitrary commands, create files, and run Python scripts as seen in scripts/orchestrate_repro.py.
  • Sanitization: While the script uses shlex.split for argument parsing and a heuristic command_score to prioritize commands, it lacks a whitelist or verification mechanism to ensure commands are safe before execution.
Audit Metadata
Risk Level
SAFE
Analyzed
May 18, 2026, 05:19 AM