Iterative evaluation and refinement patterns for improving AI agent outputs through self-critique loops.

Provides three core patterns: basic reflection (self-critique loops), evaluator-optimizer (separated generation and evaluation), and code-specific test-driven refinement
Supports multiple evaluation strategies including outcome-based assessment, LLM-as-judge comparison, and rubric-based scoring with weighted dimensions
Includes practical Python implementations with structured JSON output parsing, iteration limits, and convergence detection to prevent infinite loops
Best suited for quality-critical tasks like code generation, reports, and analysis where clear evaluation criteria and success metrics exist

Agentic Evaluation Patterns

Patterns for self-improvement through iterative evaluation and refinement.

Overview

Evaluation patterns enable agents to assess and improve their own outputs, moving beyond single-shot generation to iterative refinement loops.

Generate → Evaluate → Critique → Refine → Output
    ↑                              │
    └──────────────────────────────┘

Quality-critical generation: Code, reports, analysis requiring high accuracy
Tasks with clear evaluation criteria: Defined success metrics exist
Content requiring specific standards: Style guides, compliance, formatting