ai-evaluation

Installation
SKILL.md

AI Evaluation (Evals)

Build systematic evaluation frameworks for AI/LLM products to measure quality, catch regressions, and improve model performance.

When to Use

  • Building product with LLM/AI components
  • Need to measure AI output quality systematically
  • Comparing models or prompts (A/B testing)
  • Detecting regressions before deployment
  • Benchmarking against competitors
  • Improving AI accuracy over time
  • Explaining AI decisions to stakeholders

Core Concept

AI Evaluation (Evals) ≠ Traditional Testing

Traditional software: Deterministic (same input → same output)

Related skills
Installs
4
GitHub Stars
6
First Seen
Jan 27, 2026