ai-training-lawfulness
Lawful Basis for AI Training Data
Overview
The processing of personal data for AI model training constitutes a distinct processing operation requiring its own lawful basis under GDPR Art. 6(1). The EDPB Guidelines 04/2025 and the coordinated ChatGPT Taskforce findings establish that AI training creates unique lawful basis challenges: the scale of data collection, the difficulty of obtaining meaningful consent for open-ended AI training purposes, the tension between legitimate interest and data subject expectations, and the complexity of determining lawfulness for web-scraped and third-party datasets. This skill provides the comprehensive lawful basis assessment framework for AI training data processing, addressing each Art. 6(1) basis as applied to ML training contexts.
Fundamental Principles
AI Training as Personal Data Processing
The EDPB has confirmed that AI model training constitutes processing of personal data under Art. 4(2) GDPR when:
- Training datasets contain personal data (directly or indirectly identifiable natural persons)
- The model is trained on data that includes personal data, even if the intent is to learn general patterns
- The resulting model retains the capability to generate or reproduce personal data from training sets
- Personal data is used in any pipeline stage: collection, cleaning, annotation, augmentation, validation, testing
The controller cannot avoid GDPR obligations by claiming the model has "learned" rather than "stored" personal data. The processing occurs at the point of training, regardless of whether the model can later reproduce specific records.