llm-evaluation

Pass

Audited by Gen Agent Trust Hub on Mar 24, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill serves as an instructional guide for LLM evaluation. It provides standard code snippets and references well-known libraries in the AI ecosystem.
  • [EXTERNAL_DOWNLOADS]: The skill mentions and provides code for using several established Python libraries, including nltk, transformers, bert_score, and detoxify. These libraries typically download pre-trained models or datasets from trusted or well-known services like Hugging Face or NLTK's official repositories. This is expected behavior for the described use case.
  • [PROMPT_INJECTION]: The 'LLM-as-Judge' patterns described in the skill interpolate untrusted model outputs directly into evaluation prompts. While this creates a theoretical surface for Indirect Prompt Injection (where the evaluated content might attempt to influence the judge's score), this is an inherent characteristic of the technique being taught and is not a malicious implementation by the skill author.
  • Ingestion points: response, question, response_a, and response_b parameters in SKILL.md code snippets.
  • Boundary markers: None present in the example prompt templates.
  • Capability inventory: The snippets demonstrate calls to the openai.ChatCompletion API.
  • Sanitization: No explicit sanitization or escaping of the evaluated content is shown in the examples.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 24, 2026, 08:44 AM