The Agent Skills Directory

[SAFE]: The skill is primarily instructional, providing a methodology for developers to evaluate AI agent tasks. It does not contain any obfuscated code, unauthorized data access, or persistence mechanisms.
[COMMAND_EXECUTION]: The documentation includes examples of running deterministic checks using standard tools like npm, grep, and bash. These commands (e.g., npm test, npm run build) are routine in software development environments and are used here for legitimate testing and verification purposes.
[PROMPT_INJECTION]: No evidence of prompt injection or instructions to bypass safety guidelines was found. The instructions focus on improving agent reliability through metrics like pass@k.

eval-harness