autoresearch

Pass

Audited by Gen Agent Trust Hub on Apr 22, 2026

Risk Level: SAFECOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill references the creation and execution of shell scripts (autoresearch.sh) and uses tools like run_experiment to iterate on performance improvements. This represents a dynamic execution model where the code being run is generated during the agent's session.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection due to its core loop of processing 'ideas' and 'results'.
  • Ingestion points: External data enters the context as ideas to be tested or results to be measured, as indicated in the skill description.
  • Boundary markers: The provided documentation does not specify the use of delimiters or 'ignore' instructions to prevent the agent from obeying commands embedded within the experimental data.
  • Capability inventory: The skill possesses capabilities for tool invocation (run_experiment, log_experiment) and shell script execution, creating a direct path from ingested data to system actions.
  • Sanitization: No sanitization or validation mechanisms are described for the data being iterated upon, allowing potentially malicious instructions in the 'ideas' to influence the code generation or tool calls.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 22, 2026, 12:14 PM