benchmark-assistant
Pass
Audited by Gen Agent Trust Hub on May 14, 2026
Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
- [COMMAND_EXECUTION]: The skill executes local repository scripts and allows the agent to run commands provided by the user.
- Evidence: The instructions in
SKILL.mddirect the agent to runnode scripts/run-benchmark.jsandnode scripts/update-benchmark-dashboard.js. - Context: This execution is part of the primary functional purpose of the skill as a benchmark assistant tool.
- [SAFE]: The skill processes local data and handles benchmark cases with specific privacy and scope constraints.
- Ingestion points: Reads from
benchmark-responses.json,benchmarks/, andevals/directories. - Capability: The skill limits its actions to generating prompts or asking for user responses if external credentials or unknown configurations are required.
- Privacy: The instructions explicitly state not to include private raw conversations in benchmark cases and to preserve local drafts unless specifically asked to commit them.
Audit Metadata