eval-harness
Pass
Audited by Gen Agent Trust Hub on Apr 7, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill's primary function is to provide a structured methodology for evaluation-driven development (EDD), including capability and regression testing.
- [EXTERNAL_DOWNLOADS]: The skill includes a GitHub Actions workflow (
.github/workflows/evals.yml) that utilizes trusted third-party actions includingactions/checkoutandastral-sh/setup-uv. These are well-known services used for standard CI/CD environment setup. - [COMMAND_EXECUTION]: The skill provides scripts and instructions for running local commands such as
pytestand coverage analysis. These operations are legitimate developer actions for verifying code quality and do not exhibit signs of malicious privilege escalation or unauthorized access. - [DATA_EXFILTRATION]: The skill communicates with the Anthropic API to perform 'LLM-as-judge' grading. This involves sending data to a known AI service provider as intended by the skill's design. It correctly instructs the use of environment variables and secrets for API key management, following security best practices.
Audit Metadata