llm-evaluation

Originally fromwshobson/agents
Installation
SKILL.md

LLM Evaluation

Master comprehensive evaluation strategies for LLM applications, from automated metrics to human evaluation and A/B testing.

When to Use This Skill

  • Measuring LLM application performance systematically
  • Comparing different models or prompts
  • Detecting performance regressions before deployment
  • Validating improvements from prompt changes
  • Building confidence in production systems
  • Establishing baselines and tracking progress over time
  • Debugging unexpected model behavior
  • Evaluating RAG pipeline quality (retrieval + generation)
  • Measuring agentic task success rates
  • Testing structured output schema compliance

Core Evaluation Types

Related skills

More from ckorhonen/claude-skills

Installs
11
GitHub Stars
5
First Seen
Mar 8, 2026