evaluating-llms-harness

Originally fromovachiever/droid-tings
Installation
SKILL.md

lm-evaluation-harness - LLM Benchmarking

Quick start

lm-evaluation-harness evaluates LLMs across 60+ academic benchmarks using standardized prompts and metrics.

Installation:

pip install lm-eval

Evaluate any HuggingFace model:

lm_eval --model hf \
  --model_args pretrained=meta-llama/Llama-2-7b-hf \
  --tasks mmlu,gsm8k,hellaswag \
  --device cuda:0 \
  --batch_size 8
Related skills

More from davila7/claude-code-templates

Installs
369
GitHub Stars
27.2K
First Seen
Jan 21, 2026