ai-eval-ci

Installation
SKILL.md

AI Eval in CI

Overview

Test AI agents and LLM outputs the same way you test code — automated evaluations that run in CI, compare against baselines, and fail the build when quality drops. No dashboards to check manually. Just npx eval run --ci and a red or green build.

When to Use

  • Adding quality gates before deploying AI features to production
  • Catching prompt regressions when system prompts or models change
  • Comparing model performance (GPT-4o vs Claude Sonnet vs local Llama)
  • Validating RAG pipeline accuracy against a test dataset
  • Benchmarking agent tool-calling accuracy and latency

Instructions

Strategy 1: Promptfoo (Config-Driven Evals)

Promptfoo is the most popular open-source eval framework. Define test cases in YAML, run against multiple providers, get a comparison matrix.

Related skills
Installs
4
GitHub Stars
47
First Seen
Mar 13, 2026