advanced-evaluation

Pass

Audited by Gen Agent Trust Hub on Apr 29, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill is entirely instructional and provides boilerplate code for evaluation metrics and bias mitigation strategies. All provided scripts and references align with the stated purpose and follow established industry best practices for model evaluation.
  • [PROMPT_INJECTION]: The skill defines a surface for indirect prompt injection, as it is designed to ingest and evaluate untrusted LLM outputs. This is a characteristic of the intended use case (LLM-as-a-judge) rather than a malicious pattern. The risk is low because the skill does not request or utilize any dangerous tools, such as network access or shell execution, that could be abused if an injection were successful.
  • Ingestion points: Untrusted data enters the agent context via variables like response, response_a, and response_b in the evaluation scripts and templates (e.g., scripts/evaluation_example.py and references/full-guide.md).
  • Boundary markers: The provided prompt templates use clear markdown headers (e.g., ## Response to Evaluate) to isolate third-party data from the evaluation instructions.
  • Capability inventory: No dangerous operations, such as subprocess calls, file-system writes, or network requests, are present in the provided scripts or references.
  • Sanitization: The skill uses basic input validation for non-empty strings but does not perform content-based sanitization, which is typical for text evaluation templates.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 29, 2026, 11:31 PM