skills/smithery.ai/evaluating-llms-harness

evaluating-llms-harness

SKILL.md

lm-evaluation-harness - LLM Benchmarking

lm-evaluation-harness evaluates LLMs across 60+ academic benchmarks using standardized prompts and metrics.

Installation:

pip install lm-eval

Evaluate any HuggingFace model:

lm_eval --model hf \
  --model_args pretrained=meta-llama/Llama-2-7b-hf \
  --tasks mmlu,gsm8k,hellaswag \
  --device cuda:0 \
  --batch_size 8

Installs

Source

First Seen

Mar 8, 2026