AI Model Privacy Audit

Overview

AI model privacy auditing is the systematic assessment of whether trained ML models leak information about their training data. Models can memorize individual training records, enabling adversaries to extract personal data, determine dataset membership, reconstruct input features, or infer sensitive attributes. This skill implements a comprehensive model privacy audit methodology using established attack techniques and tools (ML Privacy Meter, ART, Foolbox) to quantify privacy leakage before deployment and periodically during operation. The audit results feed directly into the AI DPIA risk assessment and inform mitigation measure selection.

Privacy Attack Taxonomy

1. Training Data Extraction

Objective: Extract verbatim or near-verbatim records from the model's training data.

Attack Vector	Description	Target Models
Prompt-based extraction	Craft prompts that cause LLMs to regurgitate training data	Language models, generative models
Canary extraction	Insert known canary strings into training data and test if model reproduces them	Any model (testing methodology)
Gradient-based extraction	Use model gradients to reconstruct training inputs	Models with accessible gradients
Generative reconstruction	Use the model as an oracle to iteratively reconstruct training samples	GANs, VAEs, diffusion models

ai-model-privacy-audit

AI Model Privacy Audit

Overview

Privacy Attack Taxonomy

1. Training Data Extraction

More from mukul975/privacy-data-protection-skills

thailand-pdpa

privacy-record-linkage

ai-dpia

ai-transparency-reqs

apec-cbpr-cert

42-cfr-part-2