llm-evaluation

Pass

Audited by Gen Agent Trust Hub on Apr 9, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill is a template for LLM evaluation strategies, providing implementation examples for various metrics and benchmarking techniques.
  • [EXTERNAL_DOWNLOADS]: The skill correctly references standard Python libraries (e.g., NLTK, Transformers, Scikit-learn) and well-known models from the Hugging Face Hub (e.g., Microsoft DeBERTa). These are expected for the skill's purpose and originate from trusted services.
  • [DATA_EXFILTRATION]: No unauthorized data access or exfiltration patterns were identified. Network operations are limited to standard LLM API calls (OpenAI) and model downloading from recognized repositories.
  • [PROMPT_INJECTION]: The skill does not contain any instructions intended to bypass safety protocols or manipulate the underlying agent's core behavior.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 9, 2026, 04:00 AM
Security Audit — agent-trust-hub — llm-evaluation