ai-model-privacy-audit

Installation
SKILL.md

AI Model Privacy Audit

Overview

AI model privacy auditing is the systematic assessment of whether trained ML models leak information about their training data. Models can memorize individual training records, enabling adversaries to extract personal data, determine dataset membership, reconstruct input features, or infer sensitive attributes. This skill implements a comprehensive model privacy audit methodology using established attack techniques and tools (ML Privacy Meter, ART, Foolbox) to quantify privacy leakage before deployment and periodically during operation. The audit results feed directly into the AI DPIA risk assessment and inform mitigation measure selection.

Privacy Attack Taxonomy

1. Training Data Extraction

Objective: Extract verbatim or near-verbatim records from the model's training data.

Attack Vector Description Target Models
Prompt-based extraction Craft prompts that cause LLMs to regurgitate training data Language models, generative models
Canary extraction Insert known canary strings into training data and test if model reproduces them Any model (testing methodology)
Gradient-based extraction Use model gradients to reconstruct training inputs Models with accessible gradients
Generative reconstruction Use the model as an oracle to iteratively reconstruct training samples GANs, VAEs, diffusion models
Related skills
Installs
1
GitHub Stars
77
First Seen
1 day ago