eval-agent-md

Installation
SKILL.md

eval-agent-md — Behavioral Compliance Testing

What This Does

  1. Reads a CLAUDE.md (or agent .md file)
  2. Auto-generates behavioral test scenarios for each rule it finds
  3. Optionally generates integration scenarios that test multiple rules interacting (--holistic)
  4. Runs each scenario via claude -p with LLM-as-judge scoring
  5. Reports a compliance score with per-rule (and integration) pass/fail breakdown
  6. Optionally runs an automated mutation loop to improve failing rules

Workflow

Script Execution

Always run scripts with uv run --script — never python, never python3, never a bare script name. The scripts declare their own dependencies via inline # /// script metadata; uv run --script resolves all dependencies automatically — no pip install required, ever. Invoking with python or python3 will fail with import errors because the dependencies are not installed in the system environment.

Progress Reporting

Related skills
Installs
29
GitHub Stars
15
First Seen
Mar 23, 2026