deterministic-metric-design
dot-skills Deterministic Metric Design Best Practices
Design metrics that are deterministic, computable, provable, and valid — measures an agent can trust and optimize against without gaming them. The 44 rules across 8 categories take a metric from a fuzzy construct to an adoptable, machine-checkable number: define the construct, confront computability limits with sound proxies, ground it in measurement theory, prove its properties, pin its determinism, validate it empirically, harden it against optimization pressure, and package it for adoption.
A running example threads through every category — a deterministic measure of behavior-preserving codebase-size reduction (shrink code without changing how the app works). It is the ideal stress test because its ideal form is provably out of reach (Kolmogorov complexity is uncomputable; program equivalence is undecidable by Rice's theorem), so the whole craft is building a deterministic, tractable proxy with a proven guarantee.
This is the measurement-design layer that the *-algorithms skills apply (Big-O, NDCG, cyclomatic, MoJoFM) but never teach.
When to Apply
Use this skill when:
- Designing a new metric, score, or index — or reviewing someone's proposed metric for rigor
- Asked to "quantify", "measure", "score", or "rank" a property that has no agreed measure yet
- Building a deterministic optimization target an agent will push on (e.g., reduce code size without changing behavior)
- Auditing an existing metric that "feels off" — it suspiciously tracks LOC, jumps between runs, or gets gamed
- Turning a research idea or formula into something computable, reproducible, and adoptable