Evidence-Driven Agent Rules

Capture observed agent failures, promote the recurring ones into rules, and score advanced (Level 4–5) agent-readiness. For repos with a feedback signal: eval suites, run-level telemetry, A/B baselines, or a skill catalog under test. No signal yet — no benchmark, no telemetry baseline, no "we watched the agent do X" stream? Use agent-readiness alone; promotion here needs evidence to drive it.

Produces: capture writes the reflection-log scaffold (docs/reflection-log/README.md, _template.md, and a repo-root README.md §Agents pointer); promote proposes the smallest rule / hook / CI-gate diff plus the entries it cites; assess-l4l5 scores Level 4–5 maturity with a ceiling and prioritized gaps. Tracked assess-l4l5 runs also emit agent-rules-findings-ledger-<date>-<slug>.md and agent-rules-workflow-state-<date>-<slug>.json.

agent-rules

Evidence-Driven Agent Rules

Core principle