eval-harness
Pass
Audited by Gen Agent Trust Hub on May 29, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill is primarily instructional, providing a methodology for developers to evaluate AI agent tasks. It does not contain any obfuscated code, unauthorized data access, or persistence mechanisms.
- [COMMAND_EXECUTION]: The documentation includes examples of running deterministic checks using standard tools like
npm,grep, andbash. These commands (e.g.,npm test,npm run build) are routine in software development environments and are used here for legitimate testing and verification purposes. - [PROMPT_INJECTION]: No evidence of prompt injection or instructions to bypass safety guidelines was found. The instructions focus on improving agent reliability through metrics like pass@k.
Audit Metadata