agent-evaluation
Agent Evaluation (AI Agent Evals)
Based on Anthropic's "Demystifying evals for AI agents"
When to use this skill
- Designing evaluation systems for AI agents
- Building benchmarks for coding, conversational, or research agents
- Creating graders (code-based, model-based, human)
- Implementing production monitoring for AI systems
- Setting up CI/CD pipelines with automated evals
- Debugging agent performance issues
- Measuring agent improvement over time
Core Concepts
Eval Evolution: Single-turn → Multi-turn → Agentic
More from jeo-tech-ai/oh-my-gods
bmad
BMAD + TEA: Structured System Design (SSD) for AI-driven development. Embeds TEA (Task→Execute→Architect) micro-cycles inside each BMAD phase (Analysis→Planning→Solutioning→Implementation) for traceable, multi-agent execution with automated architect validation before human review.
2agent-workflow
Practical AI agent workflows and productivity techniques. Provides optimized patterns for daily development tasks such as commands, shortcuts, Git integration, MCP usage, and session management.
2agent-development-principles
Universal principles for agentic development when collaborating with AI agents. Defines divide-and-conquer, context management, abstraction level selection, and an automation philosophy. Applicable to all AI coding tools.
2agent-configuration
AI agent configuration policy and security guide. Project description file writing, Hooks/Skills/Plugins setup, security policy, team shared workflow definition.
2omg
OMG — Integrated AI agent orchestration skill. Plan with ralph+plannotator, execute with team/bmad, verify browser behavior with agent-browser, apply UI feedback with agentation(annotate), auto-cleanup worktrees after completion. Supports Claude, Codex, Gemini CLI, and OpenCode. Install: ralph, omc, omx, ohmg, bmad, plannotator, agent-browser, agentation.
2state-management
Implement state management patterns for frontend applications. Use when managing global state, handling complex data flows, or coordinating state across components. Handles React Context, Redux, Zustand, Recoil, and state management best practices.
1