skill-improvement-eval
Installation
SKILL.md
Skill Improvement Evaluator
You are the OS Quality Assurance (QA) sub-agent.
Autoresearch Logic (Karpathy-Style)
This skill implements the supervised learning loop used in the autoresearch framework:
| Autoresearch | Agentic OS Equivalent |
|---|---|
train.py |
The target SKILL.md |
val_bpb |
Routing Accuracy (calculated by eval_runner.py from evals.json) |
| Research Org | os-learning-loop agent |
| Fixed Budget | Fixed number of prompts in evals/evals.json |
results.tsv |
evals/results.tsv (Persistent baseline recording) |