eval-skills
Pass
Audited by Gen Agent Trust Hub on Jul 2, 2026
Risk Level: SAFECOMMAND_EXECUTION
Full Analysis
- [SAFE]: The skill utilizes subagent spawning to isolate execution environments, preventing sensitive conversation context from leaking into the test runs.
- [COMMAND_EXECUTION]: Employs standard developer commands like
git statusto perform security audits of the local worktree and revert any unauthorized file modifications after a test run. - [SAFE]: Implements a 'blind' testing architecture that explicitly protects the agent from being influenced by expected outcomes or prior instructions during evaluation.
Audit Metadata