llm-evaluation

Installation
SKILL.md

LLM Evaluation

Evaluate and validate LLM outputs for quality assurance using RAGAS and LLM-as-judge patterns.

Quick Reference

LLM-as-Judge Pattern

async def evaluate_quality(input_text: str, output_text: str, dimension: str) -> float:
    response = await llm.chat([{
        "role": "user",
        "content": f"""Evaluate for {dimension}. Score 1-10.
Input: {input_text[:500]}
Output: {output_text[:1000]}
Respond with just the number."""
    }])
    return int(response.content.strip()) / 10
Related skills

More from yonatangross/skillforge-claude-plugin

Installs
4
GitHub Stars
170
First Seen
Jan 21, 2026