eval

Installation

SKILL.md

EvalKit

Overview

EvalKit is a conversational evaluation framework for AI agents that guides you through creating robust evaluations using the Strands Evals SDK. Through natural conversation, you can plan evaluations, generate test data, execute evaluations, and analyze results.

How Users Interact with EvalKit

Users interact with EvalKit through natural conversation, such as:

"Build an evaluation plan for my QA agent at /path/to/agent"
"Generate test cases focusing on edge cases"
"Run the evaluation and show me the results"
"Analyze the evaluation results and suggest improvements"

EvalKit understands the evaluation workflow and guides users through four phases: Plan, Data, Eval, and Report.

Evaluation Workflow

Related skills

eval

EvalKit

Overview

How Users Interact with EvalKit

Evaluation Workflow

More from mikeyobrien/ralph-orchestrator

ralph-loop

pdd

ralph-hats

tui-validate

code-assist

tmux-terminal