eval

Installation
SKILL.md

EvalKit

Overview

EvalKit is a conversational evaluation framework for AI agents that guides you through creating robust evaluations using the Strands Evals SDK. Through natural conversation, you can plan evaluations, generate test data, execute evaluations, and analyze results.

How Users Interact with EvalKit

Users interact with EvalKit through natural conversation, such as:

  • "Build an evaluation plan for my QA agent at /path/to/agent"
  • "Generate test cases focusing on edge cases"
  • "Run the evaluation and show me the results"
  • "Analyze the evaluation results and suggest improvements"

EvalKit understands the evaluation workflow and guides users through four phases: Plan, Data, Eval, and Report.

Evaluation Workflow

Related skills
Installs
7
GitHub Stars
2.9K
First Seen
Jan 24, 2026