autoresearch

Pass

Audited by Gen Agent Trust Hub on May 10, 2026

Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill dynamically generates and executes local shell scripts, such as 'autoresearch.sh' and 'autoresearch.checks.sh', to run benchmarks and collect performance metrics.
  • [COMMAND_EXECUTION]: The instructions command the agent to enter an infinite loop ('LOOP FOREVER') and explicitly skip user confirmation between iterations to maintain autonomous operation during long-running optimization tasks.
  • [SAFE]: The skill includes helper scripts (confidence.sh, summary.sh) for analyzing experiment logs and calculating confidence scores using Median Absolute Deviation (MAD), which are executed locally and contain no network-based or malicious logic.
Audit Metadata
Risk Level
SAFE
Analyzed
May 10, 2026, 05:18 AM