monitor-experiment

Pass

Audited by Gen Agent Trust Hub on Mar 31, 2026

Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill uses the beaker command-line tool to retrieve experiment statuses and logs. This activity is explicitly restricted to the beaker namespace through the allowed-tools configuration in the YAML frontmatter, ensuring the agent cannot execute arbitrary shell commands.
  • [SAFE]: No indicators of malicious behavior, such as prompt injection, persistence mechanisms, or credential theft, were found. The skill's behavior aligns with its documented purpose of experiment monitoring and uses legitimate tools provided by the author.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 31, 2026, 02:02 PM