skills/smithery.ai/wshobson-llm-evaluation

wshobson-llm-evaluation

SKILL.md

LLM Evaluation

Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.

When to Use This Skill

Measuring LLM application performance systematically
Comparing different models or prompts
Detecting performance regressions before deployment
Validating improvements from prompt changes
Building confidence in production systems
Establishing baselines and tracking progress over time
Debugging unexpected model behavior

Core Evaluation Types

1. Automated Metrics

Fast, repeatable, scalable evaluation using computed scores.

Installs

3

Source

smithery.ai/ski…aluation

First Seen

Feb 28, 2026