anon
confide:anon — local PII redaction
Redact personally identifying information from a transcript (or a whole folder) using the
layered local stack in shared/confide_core.py: regex (emails / URLs / phones / IDs / dates)
→ Natasha (RU named entities) → local LLM (quasi-identifiers). Spans are interval-merged and
replaced with placeholders. The result is a GREEN copy safe to review.
By default anon emits a reversible map: unique reserved-sentinel placeholders (the same
EXACT value always becomes the same [CONFIDE_<TYPE>_<NNNN>], e.g. [CONFIDE_PERSON_0001],
[CONFIDE_EMAIL_0001], [CONFIDE_DATE_0002]) plus a sibling <name>.map.json (structured:
schema_version, doc_id, green_sha256, created, entries[]) mapping each placeholder to
its original. The CONFIDE_ sentinel is reserved — a real transcript essentially never
contains it, so there is no collision risk and rehydrate never touches ordinary prose like
"Person 1". This is exact-value matching, not entity coreference: inflected forms (e.g. RU
"Марина" vs "Марины") are SEPARATE placeholders (no lemmatized merge). That map is the
secret — the ONLY artifact with originals; it stays local and enables
confide:rehydrate to put real values back into a cloud analysis of the GREEN text
(round-trip: redact → analyze the green → rehydrate locally). Use --no-map for the legacy
non-reversible [TYPE] style (no map written).