Surfaces the upstream metaharness-darwin security bench command. This is the upstream's own ADR-155 — Darwin Shield — and is the closest reference implementation for ruflo's nightly self-learning security harness (#2417).

Why this matters for ruflo's ADR-155

ruflo's ADR-155 proposes three learning loops (per-dimension confidence, severity calibration, auto-fix bid). Loop A trains on accumulated (finding, dimension, human_outcome) tuples — but the gradient signal is only sound if the underlying detection mechanism converges on a known-good corpus. Darwin Shield evolves exactly that mechanism on a 10-vuln/9-decoy ground-truth set. Running this nightly gives us:

Empirical floor: if Darwin Shield's champion can't reach TPR=1/FPR=0 on the bench corpus, our Loop A's reward signal is noise.
Drift detection: week-over-week champion fitness deltas surface when the security landscape (or our mutator policy) shifts.
Baseline diversity: the 4 baselines (B0–B3) give us 4 anchor points to weight per-dimension confidence against.

harness-security-bench

Why this matters for ruflo's ADR-155