Agent Evaluation Framework Builder

Pass

Audited by Gen Agent Trust Hub on Mar 29, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill describes an LLM-as-judge evaluation pattern that is vulnerable to indirect prompt injection. This occurs when the response being evaluated contains instructions that manipulate the judge model's scoring behavior.
  • Ingestion points: The actual_response variable in the JUDGE_PROMPT template within SKILL.md is the point where untrusted data enters the judge's context.
  • Boundary markers: The provided template lacks boundary markers (such as XML tags or triple backticks) to separate the instruction block from the variable data, increasing the risk that the model will follow instructions contained within the response.
  • Capability inventory: The judge model has the capability to generate scores and reasoning which directly impact the evaluation metrics and CI/CD pass/fail status.
  • Sanitization: No sanitization or input validation is performed on the actual_response content before it is interpolated into the judge prompt.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 29, 2026, 05:56 AM
Security Audit — agent-trust-hub — Agent Evaluation Framework Builder