biomni

Fail

Audited by Gen Agent Trust Hub on Apr 29, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSCREDENTIALS_UNSAFEDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The framework provides an autonomous agent (A1) that generates and executes Python code locally. The documentation in SKILL.md warns that this code has full system privileges, which is a significant security risk if the LLM's instructions are subverted.- [REMOTE_CODE_EXECUTION]: The scripts/setup_environment.py script facilitates the generation and execution of a test script (test_biomni.py) via a subprocess call to the Python interpreter.- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it processes untrusted biomedical data (e.g., summary statistics, literature abstracts) while maintaining high-privilege code execution capabilities.
  • Ingestion points: Loads data from various file formats (.vcf, .h5ad, .csv) and external literature APIs as described in references/use_cases.md.
  • Boundary markers: No clear delimiters are used to separate user data from instructions in the provided query examples.
  • Capability inventory: Full arbitrary code execution through the A1.go() method.
  • Sanitization: No input sanitization or code review mechanism is implemented before execution.- [EXTERNAL_DOWNLOADS]: The skill automatically downloads approximately 11GB of biomedical data from Stanford's SNAP lab infrastructure and models from Hugging Face. While these are reputable research sources, the volume and nature of the downloads should be monitored.- [CREDENTIALS_UNSAFE]: The setup script prompts for and stores sensitive API keys for LLM providers (Anthropic, OpenAI, etc.) in a local .env file. These secrets are vulnerable to being read by the agent itself during its autonomous code execution phases.- [DATA_EXFILTRATION]: By combining arbitrary code execution, sensitive credential storage, and network access to LLM APIs, the skill creates a potential pathway for exfiltrating sensitive biomedical datasets or API keys if the agent's logic is compromised.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Apr 29, 2026, 02:23 AM