instruction-eval

Installation
SKILL.md

Instruction Evaluation

Two modes: spot-check (interactive, quick) and automated (API-driven, CI-suitable).

Spot-Check Mode (Interactive)

Run probes directly in a Claude Code session. Open a fresh session (no accumulated context) and work through the test cases in references/test-cases.md.

For each probe:

  1. Ask the question exactly as written
  2. Check the response against the expected behavior
  3. Mark Pass / Partial / Fail
  4. Note the failure mode if Partial or Fail

What to look for:

  • Silent gaps — Claude gives a confident but wrong/incomplete answer because the removed content was never replaced by a skill or reference file
  • Broken routing — Claude doesn't invoke a skill that should cover the topic
  • Constraint drift — Claude complies with a request it should refuse
Installs
2
Repository
ionfury/homelab
GitHub Stars
24
First Seen
May 15, 2026
instruction-eval — ionfury/homelab