ai-system-evaluation

Installation
SKILL.md

AI System Evaluation

Evaluating AI systems end-to-end.

Evaluation Criteria

1. Domain-Specific Capability

Domain Benchmarks
Math & Reasoning GSM-8K, MATH
Code HumanEval, MBPP
Knowledge MMLU, ARC
Multi-turn Chat MT-Bench

2. Generation Quality

Installs
6
GitHub Stars
4
First Seen
Mar 10, 2026
ai-system-evaluation — doanchienthangdev/omgkit