Evidence-Driven Agent Rules

Capture observed agent failures, promote the recurring ones into rules, and score advanced (Level 4–5) agent-readiness. For repos with a feedback signal: eval suites, run-level telemetry, A/B baselines, or a skill catalog under test. No signal yet — no benchmark, no telemetry baseline, no "we watched the agent do X" stream? Use harden-repo-for-coding-agents alone; promotion here needs evidence to drive it.

Boundaries

Do NOT use for first-pass agent-readiness scaffolding when no feedback signal exists yet (use harden-repo-for-coding-agents), or to design or operate an AI product's eval and optimization loops (use agent-test / agent-ops).

Produces: capture writes the reflection-log scaffold (docs/reflection-log/README.md, _template.md, and a repo-root README.md §Agents pointer); promote proposes the smallest rule / hook / CI-gate diff plus the entries it cites; assess-l4l5 scores Level 4–5 maturity with a ceiling and prioritized gaps. Tracked assess-l4l5 runs also emit rules-from-coding-agent-failures-findings-ledger-<date>-<slug>.md and rules-from-coding-agent-failures-workflow-state-<date>-<slug>.json.