writing-evals
Writing Evals
You write evaluations that prove AI capabilities work. Evals are the test suite for non-deterministic systems — they measure whether a capability still behaves correctly after every change.
If the task function uses the Vercel AI SDK, load the ai-sdk skill for correct generateText/streamText patterns.
Philosophy
- Evals are tests for AI. Every eval answers: "does this capability still work?"
- Scorers are assertions. Each scorer checks one property of the output.
- Data drives coverage. Happy path, adversarial, boundary, and negative cases.
- Read code first, ask later. Inspect the codebase and infer everything you can before asking.
How to Start
When the user asks you to write evals for an AI feature, read the code first.
More from maxmurr/agents-skills
submit-pr
Submit implementation as a pull request. Runs tests, creates a PR with the Linear issue linked, and posts the PR URL as a comment on the Linear issue. Use when implementation is complete and user wants to open a PR, submit work, or create a pull request.
2index-knowledge
Generate hierarchical AGENTS.md + CLAUDE.md knowledge base for a codebase. Creates root + complexity-scored subdirectory documentation with both file types.
2tdd
Test-driven development with red-green-refactor loop. Use when user wants to build features or fix bugs using TDD, mentions "red-green-refactor", wants integration tests, or asks for test-first development.
1atdd
Outside-in TDD with double-loop (acceptance + unit). Use when user wants acceptance-test-driven development, outside-in TDD, London-school TDD, GOOS-style development, or wants to drive design from user-facing behavior inward.
1prd-to-issues
Break a PRD into independently-grabbable Linear issues using tracer-bullet vertical slices. Use when user wants to convert a PRD to issues, create implementation tickets, or break down a PRD into work items.
1write-a-prd
Create a PRD through user interview, codebase exploration, and module design, then submit as a Linear issue. Use when user wants to write a PRD, create a product requirements document, or plan a new feature.
1