evaluation-harness

Installation
SKILL.md

Evaluation Harness

Build systematic evaluation frameworks for LLM applications.

Golden Dataset Format

[
  {
    "id": "test_001",
    "category": "code_generation",
    "input": "Write a Python function to reverse a string",
    "expected_output": "def reverse_string(s: str) -> str:\n    return s[::-1]",
    "rubric": {
      "correctness": 1.0,
      "style": 0.8,
      "documentation": 0.5
    },
    "metadata": {
Related skills

More from patricio0312rev/skills

Installs
106
GitHub Stars
38
First Seen
Jan 24, 2026