agentic-eval
Installation
Summary
Iterative evaluation and refinement patterns for improving AI agent outputs through self-critique loops.
- Provides three core patterns: basic reflection (self-critique loops), evaluator-optimizer (separated generation and evaluation), and code-specific test-driven refinement
- Supports multiple evaluation strategies including outcome-based assessment, LLM-as-judge comparison, and rubric-based scoring with weighted dimensions
- Includes practical Python implementations with structured JSON output parsing, iteration limits, and convergence detection to prevent infinite loops
- Best suited for quality-critical tasks like code generation, reports, and analysis where clear evaluation criteria and success metrics exist
SKILL.md
Agentic Evaluation Patterns
Patterns for self-improvement through iterative evaluation and refinement.
Overview
Evaluation patterns enable agents to assess and improve their own outputs, moving beyond single-shot generation to iterative refinement loops.
Generate → Evaluate → Critique → Refine → Output
↑ │
└──────────────────────────────┘
When to Use
- Quality-critical generation: Code, reports, analysis requiring high accuracy
- Tasks with clear evaluation criteria: Defined success metrics exist
- Content requiring specific standards: Style guides, compliance, formatting