agentic-eval-first-development

Installation
SKILL.md

Agentic Eval-First Development

Evals are infrastructure, not afterthoughts. Define success criteria before writing prompts or task logic. The eval becomes the spec.

Framework: Data → Task → Scores

Every eval has exactly three components:

  1. Data — Golden dataset of inputs (the test cases)
  2. Task — The operation being evaluated (LLM call, agent workflow, MCP pipeline)
  3. Scores — Categorical rubric that maps outputs to normalized 0–1 values

Step 1: Define the PRD (Data & Scores)

Build the Golden Dataset

Collect or generate 10–20 representative inputs covering the full range of expected usage.

Related skills

More from vishalsachdev/claude-code-skills

Installs
1
GitHub Stars
4
First Seen
11 days ago