autoresearch

Fail

Audited by Gen Agent Trust Hub on Apr 26, 2026

Risk Level: HIGHPROMPT_INJECTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION]: The instructions in SKILL.md and program.md explicitly command the agent to bypass human-in-the-loop oversight. Phrases such as 'You are fully autonomous. Never ask the human for permission to continue' and 'DO NOT ask permission to continue' are designed to suppress safety checkpoints and user confirmation during the autonomous research cycle.\n- [REMOTE_CODE_EXECUTION]: The script prepare.py utilizes pickle.load() to deserialize a tokenizer object from the local cache. The pickle module is inherently unsafe and can be exploited to execute arbitrary code if the cached file is replaced or modified by a malicious actor.\n- [EXTERNAL_DOWNLOADS]: The skill setup guide in README.md recommends installing the uv project manager by piping a remote script directly into the shell: curl -LsSf https://astral.sh/uv/install.sh | sh. While this targets a well-known technology provider, the piped-to-shell pattern remains a security risk as it executes remote content without prior verification.\n- [EXTERNAL_DOWNLOADS]: prepare.py downloads large dataset shards from Hugging Face (https://huggingface.co/datasets/karpathy/climbmix-400b-shuffle). While Hugging Face is a well-known and trusted service, this involves downloading substantial external data into the local environment.\n- [REMOTE_CODE_EXECUTION]: The training script train.py dynamically fetches and executes Flash Attention kernels from specific GitHub repositories (varunneal/flash-attention-3 or kernels-community/flash-attn3) via the kernels utility, which involves fetching and potentially compiling external code at runtime.
Recommendations
  • HIGH: Downloads and executes remote code from: https://astral.sh/uv/install.sh - DO NOT USE without thorough review
Audit Metadata
Risk Level
HIGH
Analyzed
Apr 26, 2026, 02:26 AM