agentic-eval

Installation
Summary

Iterative evaluation and refinement patterns for improving AI agent outputs through self-critique loops.

  • Provides three core patterns: basic reflection (self-critique loops), evaluator-optimizer (separated generation and evaluation), and code-specific test-driven refinement
  • Supports multiple evaluation strategies including outcome-based assessment, LLM-as-judge comparison, and rubric-based scoring with weighted dimensions
  • Includes practical Python implementations with structured JSON output parsing, iteration limits, and convergence detection to prevent infinite loops
  • Best suited for quality-critical tasks like code generation, reports, and analysis where clear evaluation criteria and success metrics exist
SKILL.md

Agentic Evaluation Patterns

Patterns for self-improvement through iterative evaluation and refinement.

Overview

Evaluation patterns enable agents to assess and improve their own outputs, moving beyond single-shot generation to iterative refinement loops.

Generate → Evaluate → Critique → Refine → Output
    ↑                              │
    └──────────────────────────────┘

When to Use

  • Quality-critical generation: Code, reports, analysis requiring high accuracy
  • Tasks with clear evaluation criteria: Defined success metrics exist
  • Content requiring specific standards: Style guides, compliance, formatting
Related skills

More from github/awesome-copilot

Installs
9.3K
GitHub Stars
32.8K
First Seen
Jan 23, 2026