eval-harness

Pass

Audited by Gen Agent Trust Hub on Apr 7, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill's primary function is to provide a structured methodology for evaluation-driven development (EDD), including capability and regression testing.
  • [EXTERNAL_DOWNLOADS]: The skill includes a GitHub Actions workflow (.github/workflows/evals.yml) that utilizes trusted third-party actions including actions/checkout and astral-sh/setup-uv. These are well-known services used for standard CI/CD environment setup.
  • [COMMAND_EXECUTION]: The skill provides scripts and instructions for running local commands such as pytest and coverage analysis. These operations are legitimate developer actions for verifying code quality and do not exhibit signs of malicious privilege escalation or unauthorized access.
  • [DATA_EXFILTRATION]: The skill communicates with the Anthropic API to perform 'LLM-as-judge' grading. This involves sending data to a known AI service provider as intended by the skill's design. It correctly instructs the use of environment variables and secrets for API key management, following security best practices.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 7, 2026, 04:23 PM
Security Audit — agent-trust-hub — eval-harness