benchmark-datasets
Installation
SKILL.md
AI Security Benchmark Datasets
Use standardized benchmarks to evaluate and compare AI system security, robustness, and safety.
Quick Reference
Skill: benchmark-datasets
Agent: 04-evaluation-analyst
OWASP: LLM01 (Injection), LLM02 (Disclosure), LLM04 (Poisoning), LLM05 (Output), LLM09 (Misinfo)
NIST: Measure
Use Case: Standardized security evaluation