behavioral-evals

Installation
SKILL.md

Behavioral Evals

Overview

Behavioral evaluations (evals) are tests that validate the agent's decision-making (e.g., tool choice) rather than pure functionality. They are critical for verifying prompt changes, debugging steerability, and preventing regressions.

[!NOTE] Single Source of Truth: For core concepts, policies, running tests, and general best practices, always refer to evals/README.md.


🔄 Workflow Decision Tree

  1. Does a prompt/tool change need validation?
    • No -> Normal integration tests.
    • Yes -> Continue below.
  2. Is it UI/Interaction heavy?
Related skills
Installs
133
GitHub Stars
103.7K
First Seen
Mar 24, 2026