confide:audit — corpus-scale, stats-only PII audit

Measure how much PII lives across a whole folder of sessions, without ever exposing any of it. The audit runs the layered LOCAL detector stack from shared/confide_core.py (regex → Natasha → local LLM) over each file and emits only aggregates. This mirrors the real_session_eval privacy contract: read text only in-process, emit counts.

Privacy invariants (do not violate)

Local-only. No cloud APIs. Raw transcript text never leaves the machine.
Stats-only output. The report (markdown + json + optional HTML) contains ONLY counts and rates — never a transcript substring, never a detected PII value.
No filenames. Per-file rows are keyed by anonymized ids own-00, own-01, … The original path/name is never written or printed. On an unreadable file, only the index + exception class name is recorded.
Safe to surface. Because it is counts-only, the aggregate report can be shared with a cloud agent or pasted into a chat. The PII stays on the machine.