agentic-eval
Iterative evaluation and refinement patterns for improving AI agent outputs through self-critique loops.
- Provides three core patterns: basic reflection (self-critique loops), evaluator-optimizer (separated generation and evaluation), and code-specific test-driven refinement
- Supports multiple evaluation strategies including outcome-based assessment, LLM-as-judge comparison, and rubric-based scoring with weighted dimensions
- Includes practical Python implementations with structured JSON output parsing, iteration limits, and convergence detection to prevent infinite loops
- Best suited for quality-critical tasks like code generation, reports, and analysis where clear evaluation criteria and success metrics exist
Agentic Evaluation Patterns
Patterns for self-improvement through iterative evaluation and refinement.
Overview
Evaluation patterns enable agents to assess and improve their own outputs, moving beyond single-shot generation to iterative refinement loops.
Generate → Evaluate → Critique → Refine → Output
↑ │
└──────────────────────────────┘
When to Use
- Quality-critical generation: Code, reports, analysis requiring high accuracy
- Tasks with clear evaluation criteria: Defined success metrics exist
- Content requiring specific standards: Style guides, compliance, formatting
More from github/awesome-copilot
git-commit
Execute git commit with conventional commit message analysis, intelligent staging, and message generation. Use when user asks to commit changes, create a git commit, or mentions "/commit". Supports: (1) Auto-detecting type and scope from changes, (2) Generating conventional commit messages from diff, (3) Interactive commit with optional type/scope/description overrides, (4) Intelligent file staging for logical grouping
30.2Kgh-cli
GitHub CLI (gh) comprehensive reference for repositories, issues, pull requests, Actions, projects, releases, gists, codespaces, organizations, extensions, and all GitHub operations from the command line.
21.2Kdocumentation-writer
Diátaxis Documentation Expert. An expert technical writer specializing in creating high-quality software documentation, guided by the principles and structure of the Diátaxis technical documentation authoring framework.
17.4Kprd
Generate high-quality Product Requirements Documents (PRDs) for software systems and AI-powered features. Includes executive summaries, user stories, technical specifications, and risk analysis.
17.4Kexcalidraw-diagram-generator
Generate Excalidraw diagrams from natural language descriptions. Use when asked to "create a diagram", "make a flowchart", "visualize a process", "draw a system architecture", "create a mind map", or "generate an Excalidraw file". Supports flowcharts, relationship diagrams, mind maps, and system architecture diagrams. Outputs .excalidraw JSON files that can be opened directly in Excalidraw.
16.4Krefactor
Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements.
16.1K