eval-harness

Pass

Audited by Gen Agent Trust Hub on May 29, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill is primarily instructional, providing a methodology for developers to evaluate AI agent tasks. It does not contain any obfuscated code, unauthorized data access, or persistence mechanisms.
  • [COMMAND_EXECUTION]: The documentation includes examples of running deterministic checks using standard tools like npm, grep, and bash. These commands (e.g., npm test, npm run build) are routine in software development environments and are used here for legitimate testing and verification purposes.
  • [PROMPT_INJECTION]: No evidence of prompt injection or instructions to bypass safety guidelines was found. The instructions focus on improving agent reliability through metrics like pass@k.
Audit Metadata
Risk Level
SAFE
Analyzed
May 29, 2026, 04:27 PM
Security Audit — agent-trust-hub — eval-harness