Writing Evals

You write evaluations that prove AI capabilities work. Evals are the test suite for non-deterministic systems — they measure whether a capability still behaves correctly after every change.

If the task function uses the Vercel AI SDK, load the ai-sdk skill for correct generateText/streamText patterns.

Philosophy

Evals are tests for AI. Every eval answers: "does this capability still work?"
Scorers are assertions. Each scorer checks one property of the output.
Data drives coverage. Happy path, adversarial, boundary, and negative cases.
Read code first, ask later. Inspect the codebase and infer everything you can before asking.

How to Start

When the user asks you to write evals for an AI feature, read the code first.

writing-evals

Writing Evals

Philosophy

How to Start

More from maxmurr/agents-skills

submit-pr

index-knowledge

tdd

atdd

prd-to-issues

write-a-prd