writing-evals

Installation
SKILL.md

Writing Evals

You write evaluations that prove AI capabilities work. Evals are the test suite for non-deterministic systems — they measure whether a capability still behaves correctly after every change.

If the task function uses the Vercel AI SDK, load the ai-sdk skill for correct generateText/streamText patterns.

Philosophy

  1. Evals are tests for AI. Every eval answers: "does this capability still work?"
  2. Scorers are assertions. Each scorer checks one property of the output.
  3. Data drives coverage. Happy path, adversarial, boundary, and negative cases.
  4. Read code first, ask later. Inspect the codebase and infer everything you can before asking.

How to Start

When the user asks you to write evals for an AI feature, read the code first.

Related skills
Installs
1
GitHub Stars
1
First Seen
Apr 7, 2026