context-eval

Installation
SKILL.md

Context Eval

Run the same tasks with and without the harness, grade outputs, measure the delta. No measurable improvement → token tourism.

The Eval Loop

1. Define what you're evaluating (the harness)
2. Write 3-5 realistic task prompts
3. Define success criteria (assertions)
4. Run tasks WITH and WITHOUT (you MUST actually run — see Step 4)
5. Grade both against assertions
6. Compare: did the harness help?
7. If iterating: modify, repeat from step 4

Use tasks to track progress — multi-step; tracking prevents skipping.

Companion Files

Installs
5
GitHub Stars
6
First Seen
Mar 18, 2026
context-eval — andurilcode/skills