evaluation-methodology

Installation
SKILL.md

Evaluation Methodology

This document is the authoritative reference for how PluginEval measures plugin and skill quality. It covers the three evaluation layers, all ten scoring dimensions, the composite formula, badge thresholds, anti-pattern flags, Elo ranking, and actionable improvement tips.

Related: Full rubric anchors


The Three Evaluation Layers

PluginEval stacks three complementary layers. Each layer produces a score between 0.0 and 1.0 for each applicable dimension, and later layers override or blend with earlier ones according to per-dimension blend weights.

Layer 1 — Static Analysis

Speed: < 2 seconds. No LLM calls. Deterministic.

Related skills

More from wshobson/agents

Installs
2.6K
Repository
wshobson/agents
GitHub Stars
35.3K
First Seen
Mar 27, 2026