agent-eval
Pass
Audited by Gen Agent Trust Hub on May 19, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: No security issues were identified in the skill. The instructions and metadata are purely informational, providing templates for agent evaluation tasks.- [COMMAND_EXECUTION]: The skill documents the use of a tool that executes shell commands (e.g., pytest, npm run build) provided in YAML configuration files to judge the success of an agent's code. This is a primary and expected feature of the benchmarking tool.
Audit Metadata