llm-evaluation
Pass
Audited by Gen Agent Trust Hub on Apr 9, 2026
Risk Level: SAFE
Full Analysis
- [SAFE]: The skill is a template for LLM evaluation strategies, providing implementation examples for various metrics and benchmarking techniques.
- [EXTERNAL_DOWNLOADS]: The skill correctly references standard Python libraries (e.g., NLTK, Transformers, Scikit-learn) and well-known models from the Hugging Face Hub (e.g., Microsoft DeBERTa). These are expected for the skill's purpose and originate from trusted services.
- [DATA_EXFILTRATION]: No unauthorized data access or exfiltration patterns were identified. Network operations are limited to standard LLM API calls (OpenAI) and model downloading from recognized repositories.
- [PROMPT_INJECTION]: The skill does not contain any instructions intended to bypass safety protocols or manipulate the underlying agent's core behavior.
Audit Metadata