eval-harness
Pass
Audited by Gen Agent Trust Hub on Mar 23, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill provides a structured methodology for Evaluation-Driven Development (EDD), focusing on defining success criteria and running regression tests.
- [COMMAND_EXECUTION]: The framework utilizes standard shell commands such as
grep,npm test, andnpm run buildfor deterministic code-based grading. These operations are restricted to the local development environment and reflect standard engineering practices. - [DATA_EXFILTRATION]: There are no network-enabled commands or operations that attempt to access or exfiltrate sensitive data from the system.
- [EXTERNAL_DOWNLOADS]: The skill does not perform any remote downloads or execute scripts from external sources.
- [PROMPT_INJECTION]: The instructions are purely technical and functional; they do not contain any patterns designed to bypass AI safety guardrails or extract system prompts.
Audit Metadata