agent-evaluation

Installation
SKILL.md

Agent Evaluation Skill

Evaluate AI agent task execution using world-class LLM-as-judge patterns from DeepEval, RAGAS, and G-Eval frameworks.

Output Format

Evaluation results are saved to evals/results/eval-${yyyy-mm-dd-hh-mm}-${commit_id}.md

Results Table

Task Input Agent Output Reflection Input Reflection Output Score Verdict Feedback
Create hello.js... I've created hello.js with... Task: Create hello.js Agent Output: ... Task complete 5/5 COMPLETE Agent produced output; Found completion indicators
Fix the bug... I found the issue and... Task: Fix bug Agent Output: ... (none) 3/5 PARTIAL Agent produced output; Missing reflection

Run Evaluation

Related skills
Installs
3
GitHub Stars
8
First Seen
5 days ago