agent-evaluation
Agent Evaluation
Use this skill when the work is deciding how an AI agent should be measured, not when the work is simply building the feature itself.
Read references/grader-selection.md when you need help picking grader types,
benchmark families, or score dimensions for a specific agent surface.
Read references/ops-and-calibration.md when you need harness design,
transcript review, CI gates, sampling policy, saturation checks, or production
monitoring guidance.
When to use this skill
More from akillness/oh-my-gods
deepagents
>
19agent-workflow
>
19data-analysis
>
16omg
OMG — Integrated AI agent orchestration skill. Plan with ralph+plannotator, execute with team/bmad, verify browser behavior with agent-browser, apply UI feedback with agentation(annotate), auto-cleanup worktrees after completion. Supports Claude, Codex, Gemini CLI, and OpenCode. Install: ralph, omc, omx, ohmg, bmad, plannotator, agent-browser, agentation.
16frontend-design-system
Produce production-grade UI designs using clear design tokens, layout rules, motion guidance, and accessibility checks for consistent, scalable frontend development.
15omc
oh-my-claudecode — Teams-first multi-agent orchestration layer for Claude Code. 32 specialized agents, smart model routing, persistent execution loops, and real-time HUD visibility. Zero learning curve.
15