agent-evaluation

Installation

SKILL.md

Agent Evaluation

Use this skill when the work is deciding how an AI agent should be measured, not when the work is simply building the feature itself.

Read references/grader-selection.md when you need help picking grader types, benchmark families, or score dimensions for a specific agent surface.

Read references/ops-and-calibration.md when you need harness design, transcript review, CI gates, sampling policy, saturation checks, or production monitoring guidance.

When to use this skill

Installs

Repository

akillness/oh-my-gods

GitHub Stars

First Seen

Mar 11, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass

agent-evaluation — akillness/oh-my-gods