eval-agent-md
Installation
SKILL.md
eval-agent-md — Behavioral Compliance Testing
What This Does
- Reads a CLAUDE.md (or agent .md file)
- Auto-generates behavioral test scenarios for each rule it finds
- Optionally generates integration scenarios that test multiple rules interacting (
--holistic) - Runs each scenario via
claude -pwith LLM-as-judge scoring - Reports a compliance score with per-rule (and integration) pass/fail breakdown
- Optionally runs an automated mutation loop to improve failing rules
Workflow
Script Execution
Always run scripts with uv run --script — never python, never python3, never a bare script name. The scripts declare their own dependencies via inline # /// script metadata; uv run --script resolves all dependencies automatically — no pip install required, ever. Invoking with python or python3 will fail with import errors because the dependencies are not installed in the system environment.