hugging-face-evaluation

Originally fromhuggingface/skills
Installation
SKILL.md

Overview

This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data:

  • Extracting existing evaluation tables from README content
  • Importing benchmark scores from Artificial Analysis
  • Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai)

When to Use

  • You need to add structured evaluation results to a Hugging Face model card.
  • You want to import benchmark data or run custom evaluations with vLLM, lighteval, or inspect-ai.
  • You are preparing leaderboard-compatible model-index metadata for a model release.

Integration with HF Ecosystem

  • Model Cards: Updates model-index metadata for leaderboard integration
  • Artificial Analysis: Direct API integration for benchmark imports
  • Papers with Code: Compatible with their model-index specification
  • Jobs: Run evaluations directly on Hugging Face Jobs with uv integration
  • vLLM: Efficient GPU inference for custom model evaluation
  • lighteval: HuggingFace's evaluation library with vLLM/accelerate backends
  • inspect-ai: UK AI Safety Institute's evaluation framework
Related skills
Installs
34
GitHub Stars
37.3K
First Seen
Mar 10, 2026