agent-evaluation

Installation

SKILL.md

Agent Evaluation (AI Agent Evals)

Based on Anthropic's "Demystifying evals for AI agents"

When to use this skill

Designing evaluation systems for AI agents
Building benchmarks for coding, conversational, or research agents
Creating graders (code-based, model-based, human)
Implementing production monitoring for AI systems
Setting up CI/CD pipelines with automated evals
Debugging agent performance issues
Measuring agent improvement over time

Core Concepts

Eval Evolution: Single-turn → Multi-turn → Agentic

Related skills

More from jeo-tech-ai/oh-my-gods

bmad
BMAD + TEA: Structured System Design (SSD) for AI-driven development. Embeds TEA (Task→Execute→Architect) micro-cycles inside each BMAD phase (Analysis→Planning→Solutioning→Implementation) for traceable, multi-agent execution with automated architect validation before human review.
2
agent-workflow
Practical AI agent workflows and productivity techniques. Provides optimized patterns for daily development tasks such as commands, shortcuts, Git integration, MCP usage, and session management.
2
agent-development-principles
Universal principles for agentic development when collaborating with AI agents. Defines divide-and-conquer, context management, abstraction level selection, and an automation philosophy. Applicable to all AI coding tools.
2
agent-configuration
AI agent configuration policy and security guide. Project description file writing, Hooks/Skills/Plugins setup, security policy, team shared workflow definition.
2
omg
OMG — Integrated AI agent orchestration skill. Plan with ralph+plannotator, execute with team/bmad, verify browser behavior with agent-browser, apply UI feedback with agentation(annotate), auto-cleanup worktrees after completion. Supports Claude, Codex, Gemini CLI, and OpenCode. Install: ralph, omc, omx, ohmg, bmad, plannotator, agent-browser, agentation.
2
state-management
Implement state management patterns for frontend applications. Use when managing global state, handling complex data flows, or coordinating state across components. Handles React Context, Redux, Zustand, Recoil, and state management best practices.
1

Installs

Repository

jeo-tech-ai/oh-my-gods

GitHub Stars

First Seen

Mar 11, 2026

Security Audits

Gen Agent Trust HubWarn

SocketFail

SnykPass

agent-evaluation

Agent Evaluation (AI Agent Evals)

When to use this skill

Core Concepts

Eval Evolution: Single-turn → Multi-turn → Agentic

More from jeo-tech-ai/oh-my-gods

bmad

agent-workflow

agent-development-principles

agent-configuration

omg

state-management