10/17/2023

"KI-Flash": data privacy requirements for training an AI.

After we already presented an overview of the legal requirements for training an AI in our last KI-Flash, we would like to continue to provide you with legal impulses at regular intervals. Since time is a rare commodity in today's society, we want to get straight to the point with our "AI Flash" and summarize the legal challenges briefly and concisely:

Today's topic: data privacy requirements for training an AI.

The training of an AI application in particular is in enormous tension with data protection requirements. In the case of so-called machine learning, i.e. a process by means of which an AI application independently "learns" the solution to a specific problem by repeating a specific task, a whole host of questions relating to data protection law must therefore be answered. Where do I get my data from? Am I allowed to process it at all, and what protective measures need to be taken?

From a technical point of view, machine learning first requires a corresponding algorithm, which - in contrast to "ordinary" programs - does not prescribe a clear procedure for the system, but instead enables the AI application to solve the problem independently. In addition, the AI system "improves" with increasing experience and data volume, which can be seen as a significant trigger for the data protection issues.

Strict data protection requirements

It can already be stated at the outset that the training of an AI application "stands and falls" with the quality and quantity of the data used in the process. In addition to the sheer quantity of data used, particular attention must be paid to ensuring that the content of the respective data is correct and complete, so that no unintentional errors occur subsequently. The data protection supervisory authorities therefore already impose a large number of requirements on the collection and use of training data. In summary, the entire process of refining raw data into training data must be documented and evaluated in terms of data protection. This includes the quantity and origin of the data as well as a specification of the respective refinement process, i.e., in particular, standardization, error correction, and the error testing procedure. In addition, any unintentional commingling, modification or leakage of the data must be prevented already at this stage.

The special feature of training an AI is therefore that care must be taken at a very early stage to ensure that no errors "creep in" which can hardly be corrected later. Insofar as the GDPR provides for formal requirements for the controller (in particular the performance of a data protection impact assessment), these should therefore be implemented at a very early stage.

Use of personal data required?

It will also always be necessary to ask whether the purpose of the respective data processing could also have been achieved by a "milder means". For example, could anonymized or synthetic data have been used? At the very least, it will have to be clarified why a prior procedure for pseudonymization of the data used was not carried out. In addition, it must be examined in detail whether data sets that are irrelevant for the decision-making process of the AI application were excluded from the refinement of the raw data. If, on the other hand, personal data is (or must be) used to train the AI, strict requirements apply to the protective measures to be taken.

Thorough examination is essential

In any case, the origin of the data and the determination of the legal basis for the data processing should be examined extremely thoroughly. For example, it is regularly not possible to take data from business customers or from the Internet without checking it and then use it to train an AI. Although a legal basis under data protection law is not ruled out from the outset, this must be thoroughly checked and documented, as must compliance with the requirements for a so-called change of purpose.

Our next AI Flash will provide an outlook on the legal requirements of the European AI Regulation.