agent-test
Installation
SKILL.md
Agent Test
Designing the measurement an AI agent or skill is judged by — evals, LLM-as-judges,
trajectory tests, held-out benchmarks, and activation evals. The agent-actor analog of human
test design. Provenance lives in skill.json; this file is runtime routing only.
Produces: a change-plan.md (DO), an audit-report.md plus a findings-ledger +
workflow-state when tracked (REVIEW), or a design-doc.md / refactor-runbook.md /
explanation.md (DESIGN).
Boundaries
Do NOT use to operate or watch the loop these evals feed (use agent-ops), design the SDK/tool surface (use agent-dx), write agent-native docs (use agent-docs), or scaffold repo CI gates (use harden-repo-for-coding-agents), or to operate the eval/optimization loop, autonomy, and reliability (use agent-ops).