ai-model-privacy-audit
Installation
SKILL.md
AI Model Privacy Audit
Overview
AI model privacy auditing is the systematic assessment of whether trained ML models leak information about their training data. Models can memorize individual training records, enabling adversaries to extract personal data, determine dataset membership, reconstruct input features, or infer sensitive attributes. This skill implements a comprehensive model privacy audit methodology using established attack techniques and tools (ML Privacy Meter, ART, Foolbox) to quantify privacy leakage before deployment and periodically during operation. The audit results feed directly into the AI DPIA risk assessment and inform mitigation measure selection.
Privacy Attack Taxonomy
1. Training Data Extraction
Objective: Extract verbatim or near-verbatim records from the model's training data.
| Attack Vector | Description | Target Models |
|---|---|---|
| Prompt-based extraction | Craft prompts that cause LLMs to regurgitate training data | Language models, generative models |
| Canary extraction | Insert known canary strings into training data and test if model reproduces them | Any model (testing methodology) |
| Gradient-based extraction | Use model gradients to reconstruct training inputs | Models with accessible gradients |
| Generative reconstruction | Use the model as an oracle to iteratively reconstruct training samples | GANs, VAEs, diffusion models |
Related skills