ai-data-retention
Installation
SKILL.md
AI Model Retention and Unlearning
Overview
GDPR Art. 5(1)(e) storage limitation requires that personal data be kept no longer than necessary for the processing purpose. For AI systems, this creates complex retention challenges: training data used to build a model may no longer be needed once training is complete, but the model itself encodes information about the training data. Machine unlearning — the process of removing the influence of specific data from a trained model — is an emerging field that addresses the gap between deleting training data and eliminating its influence from model parameters. This skill provides retention policies, deletion verification methods, and machine unlearning techniques for AI compliance.
AI Data Retention Categories
| Data Category | Description | Retention Consideration |
|---|---|---|
| Raw training data | Original personal data used for model training | Delete after training unless retraining justifies retention |
| Processed training data | Cleaned, augmented, feature-engineered data | Same as raw — delete when training purpose exhausted |
| Validation/test data | Data used for model evaluation | Retain for model audit and comparison; pseudonymise |
| Model weights/parameters | Trained model artefacts encoding training data information | Retain while model is deployed; delete on decommission |
| Inference logs | Inputs and outputs of model predictions | Retention based on purpose (audit, debugging, rights exercise) |
| Model metadata | Training configuration, hyperparameters, provenance | Retain for compliance documentation; low privacy risk |
| Embedding vectors | Dense representations derived from personal data | May contain personal data — apply retention policy |
Retention Policy Framework
Related skills