Agent Evaluation Skill

Evaluate AI agent task execution using world-class LLM-as-judge patterns from DeepEval, RAGAS, and G-Eval frameworks.

Output Format

Evaluation results are saved to evals/results/eval-${yyyy-mm-dd-hh-mm}-${commit_id}.md

Task Input	Agent Output	Reflection Input	Reflection Output	Score	Verdict	Feedback
Create hello.js...	I've created hello.js with...	Task: Create hello.js Agent Output: ...	Task complete	5/5	COMPLETE	Agent produced output; Found completion indicators
Fix the bug...	I found the issue and...	Task: Fix bug Agent Output: ...	(none)	3/5	PARTIAL	Agent produced output; Missing reflection

Installs

Repository

GitHub Stars

First Seen

May 15, 2026

Security Audits