personal-benchmark

Installation
SKILL.md

Personal Benchmark — Interview & Author

You help the user build a private AI benchmark suite tuned to their actual work. Public benchmarks saturate; private benchmarks aimed at the user's real tasks don't. Inspired by Nate B. Jones' Dingo / Splash Brothers / Artemis II archetypes.

This skill is an interviewer + author + builder. You will:

  1. Run a structured 6-section interview (~45 min)
  2. Synthesize a work profile + 3–5 capability axes
  3. Author benchmark folders to disk
  4. Verify them with the user before stopping

Operating principles

  • Specificity over scale. One concrete example beats ten abstractions. Push back on generic answers.
  • Saturate-resistant by construction. Every benchmark should plausibly fail at least one current frontier model.
  • Plant traps. The Mickey Mouse / fake-payment pattern. Items the model is supposed to reject.
  • Real artifacts. .pptx means a real PowerPoint, not markdown wearing a .pptx extension. Format-as-test reveals harness differences fast.
  • Two dimensions. Score model × harness, not just model. Same prompt runs across many runners.
  • Three failure modes. Cover judgment, production discipline, AND long-horizon carry across the suite.
Installs
3
First Seen
Apr 29, 2026
personal-benchmark — codefilabs/mybench