evaluating-llms-harness

Installation

SKILL.md

lm-evaluation-harness - LLM Benchmarking

lm-evaluation-harness evaluates LLMs across 60+ academic benchmarks using standardized prompts and metrics.

Installation:

pip install lm-eval

Evaluate any HuggingFace model:

lm_eval --model hf \
  --model_args pretrained=meta-llama/Llama-2-7b-hf \
  --tasks mmlu,gsm8k,hellaswag \
  --device cuda:0 \
  --batch_size 8

Installs

Repository

GitHub Stars

First Seen

Mar 28, 2026

Security Audits

evaluating-llms-harness — firecrawl/ai-research-skills