agentic-eval-first-development

Installation

SKILL.md

Agentic Eval-First Development

Evals are infrastructure, not afterthoughts. Define success criteria before writing prompts or task logic. The eval becomes the spec.

Framework: Data → Task → Scores

Every eval has exactly three components:

Data — Golden dataset of inputs (the test cases)
Task — The operation being evaluated (LLM call, agent workflow, MCP pipeline)
Scores — Categorical rubric that maps outputs to normalized 0–1 values

Step 1: Define the PRD (Data & Scores)

Build the Golden Dataset

Collect or generate 10–20 representative inputs covering the full range of expected usage.

Installs

2

Repository

vishalsachdev/c…e-skills

GitHub Stars

4

First Seen

May 4, 2026

Security Audits

Gen Agent Trust HubPass

agentic-eval-first-development — vishalsachdev/claude-code-skills