skills/smithery.ai/llm-evaluation

llm-evaluation

SKILL.md

LLM Evaluation & Testing

Comprehensive guide to evaluating and testing LLM applications including prompt testing, output validation, hallucination detection, benchmark creation, A/B testing, and quality metrics.


Quick Reference

When to use this skill:

  • Testing LLM application outputs
  • Validating prompt quality and consistency
  • Detecting hallucinations and factual errors
  • Creating evaluation benchmarks
  • A/B testing prompts or models
  • Implementing continuous evaluation (CI/CD)
  • Measuring retrieval quality (for RAG)
  • Debugging unexpected LLM behavior
Installs
13
First Seen
Mar 8, 2026