DeepEval

Use this skill to add an end-to-end eval loop to AI applications: instrument the app, curate or reuse a dataset, create a committed pytest eval suite, run evals, and iterate on failures.

Workflow Summary

Inspect the target app and existing DeepEval usage.
Ask the required intake questions.
Reuse existing metrics and datasets when available.
Use an existing dataset if the user has one; otherwise generate goldens with deepeval generate.
Prefer native DeepEval integrations, then add minimal tracing add-ons.
Run deepeval test run.
Iterate for the requested number of rounds, defaulting to 5.

Core Principles

Installs

Repository

confident-ai/deepeval

GitHub Stars

15.4K

First Seen

Apr 28, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass