llm-evaluation

Installation

SKILL.md

LLM Evaluation

Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.

When to Use This Skill

Measuring LLM application performance systematically
Comparing different models or prompts
Detecting performance regressions before deployment
Validating improvements from prompt changes
Building confidence in production systems
Establishing baselines and tracking progress over time
Debugging unexpected model behavior

Core Evaluation Types

1. Automated Metrics

Fast, repeatable, scalable evaluation using computed scores.

Installs

5

Repository

microck/ordinar…e-skills

GitHub Stars

252

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass

llm-evaluation — microck/ordinary-claude-skills