skills/wshobson/agents/llm-evaluation/Gen Agent Trust Hub

llm-evaluation

Pass

Audited by Gen Agent Trust Hub on May 29, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill uses established open-source libraries and official APIs from well-known providers to perform model evaluations. The code follows standard practices for calculating automated metrics, conducting A/B tests, and tracking performance regressions.
  • [PROMPT_INJECTION]: The skill implements LLM-as-judge patterns which are inherently susceptible to indirect prompt injection where the data being evaluated attempts to influence the evaluator.
  • Ingestion points: Model responses (e.g., in llm_judge_quality) are interpolated directly into prompts processed by an evaluator LLM.
  • Boundary markers: Absent; the prompts use simple labels like 'Response:' without specialized delimiters or instructions to ignore nested commands.
  • Capability inventory: The judging functions return structured data and do not have access to dangerous system capabilities, file system writes, or unauthorized network operations.
  • Sanitization: None; input is passed directly to the model. This is standard behavior for evaluation tools and the risk is restricted to the accuracy of the resulting metrics.
Audit Metadata
Risk Level
SAFE
Analyzed
May 29, 2026, 05:50 AM
Security Audit — agent-trust-hub — llm-evaluation