12/13/2024

KI-Flash: AI and data subjects' rights

After reporting on the so-called AI competence according to Article 4 of the European AI Regulation in our last AI Flash, we would like to continue to provide you with legal impulses at regular intervals. As time is a rare commodity in today's society, we want to get straight to the point with our ‘AI Flash’ and summarise the legal challenges briefly and concisely:

Breakdown: Are data subject rights under the GDPR (e.g. right to erasure and right to information) also to be observed in AI models? For example, can an employee request information about their personal data because they have used an AI system at work? And if so, how far does this right extend?

Short answer: It depends, but there are good reasons not to assume data processing in accordance with the GDPR in relation to AI models, so that data subjects' rights do not have to be fulfilled in this respect. The use of the respective AI system itself, on the other hand, is subject to the full scope of the GDPR.

Today's topic: AI and data subjects' rights

The use of AI is known to be associated with data protection challenges. A particularly relevant topic in this respect is the implementation of data subject rights in accordance with Art. 15 et seq. GDPR. The following AI Flash is intended to highlight the issues that arise in this context on the one hand and to provide possible solutions for companies on the other. As the topic is the subject of many discussions (from a legal and technical perspective), this article can only provide an initial categorisation without being able to build on the authority of a court or supervisory authority decision.

What are the difficulties in implementing data subjects' rights?

The following example will serve to illustrate the key issues in the context of implementing data subjects' rights:

A company wants to use a generative AI system to increase the productivity and efficiency of its daily work processes. For this purpose, a cloud-based AI service (‘AI-as-a-Service’) from a well-known provider is to be used, whereby the AI model used by the provider is operated on the provider's servers. The company uses the ‘off-the-shelf’ AI system, which means that the development and training of the AI is the sole responsibility of the AI provider.

If, when using the AI system described here, data subject rights pursuant to Art. 15 et seq. GDPR (e.g. because an employee of the company asserts their right to erasure in accordance with Art. 17 (1) GDPR), various data protection issues arise:

Is it even (technically) possible for the company to implement data subject rights in relation to the AI model? As already shown, the company has no influence whatsoever on the development and training of the AI model used.
Is there even a legal obligation to implement data subject rights in relation to the AI model? This can at least be questioned from several points of view. In particular, it is highly controversial whether an AI model in itself has a personal reference (or ‘contains’ personal data) and can therefore be a reference point for data subjects' rights. It is also debatable whether the company - as far as the AI model in the example case presented here is concerned - is actually to be regarded as the controller within the meaning of Art. 4 No. 7 GDPR.

Repetition: Distinction between AI system and AI model

We have already made a very detailed distinction between the AI system and the AI model in previous AI Flashes. In order to create a better understanding of the issues raised here, we will nevertheless briefly summarise them again below:

An AI system is the functional AI application, usually equipped with a user interface, while the AI model is the underlying (technical) centrepiece, i.e. the AI-supported functionality. The latter refers in particular to the actual algorithm and its weightings. Recital 97 of the European AI Act literally states with regard to certain AI models (specifically GPAIM):

The notion of general-purpose AI models should be clearly defined and set apart from the notion of AI systems to enable legal certainty. The definition should be based on the key functional characteristics of a general-purpose AI model, in particular the generality and the capability to competently perform a wide range of distinct tasks. These models are typically trained on large amounts of data, through various methods, such as self-supervised, unsupervised or reinforcement learning. General-purpose AI models may be placed on the market in various ways, including through libraries, application programming interfaces (APIs), as direct download, or as physical copy. These models may be further modified or fine-tuned into new models. Although AI models are essential components of AI systems, they do not constitute AI systems on their own. AI models require the addition of further components, such as for example a user interface, to become AI systems. AI models are typically integrated into and form part of AI systems. ’

An (IT) system therefore becomes an AI system precisely when an AI model is integrated. While the AI system - figuratively speaking - represents the AI ‘for use’, the AI model is not suitable for the end user without integration into a system.

To clarify: The issues addressed here relate exclusively to the AI model. As far as the mere use of an AI system is concerned (i.e. in particular the input and/or output data and their specific use), all data subject rights pursuant to Art. 15 et seq. GDPR must be implemented by the company using the AI system.

Does an AI model have a personal reference?

In order to approach the problems presented here in the first step, the fundamental question can first be raised as to whether data subject rights pursuant to Art. 15 et seq. GDPR can relate to an AI model at all. Since the GDPR only addresses the processing of personal data, the AI model in its ‘raw form’ would already have to be regarded as a processing of personal data. Whether and to what extent this is the case is assessed differently - also by the data protection supervisory authorities.

The Hamburg Commissioner for Data Protection and Freedom of Information takes a very clear view in this respect. In his discussion paper ‘Large Language Models and Personal Data’, the following basic statements are made in this regard:

‘1. the mere storage of an LLM does not constitute processing within the meaning of Art. 4 No. 2 GDPR. This is because no personal data is stored in LLMs. Insofar as personal data is processed in an LLM-supported AI system, the processing operations must comply with the requirements of the GDPR. This applies in particular to the output of such an AI system.

2. due to the lack of storage of personal data in the LLM, the rights of data subjects under the GDPR cannot apply to the model itself. However, claims for access, erasure or rectification can at least relate to the input and output of an AI system of the responsible provider or operator.

3. the training of LLMs with personal data must be carried out in compliance with data protection regulations. The rights of data subjects must also be observed. However, training that may violate data protection regulations does not affect the legality of the use of such a model in an AI system.’

In its discussion paper, the Hamburg Commissioner for Data Protection and Freedom of Information takes a very detailed look at the technical basis of LLMs and comes to the - at least justifiable - conclusion that an AI model does not in itself contain any personal data. The argument put forward is that no ‘clear data’ is stored in an AI model from the outset, but only language fragments (so-called tokens), which are (mathematically) contextualised by vectorial references. At no point does the AI model contain a name such as that of the founder of SKW Schwarz ‘Wolf Schwarz’, but only fragments such as ‘Wo’, ‘lf’, ‘Schwa’ and ‘rz’ as well as the trained statistical weightings according to which the most probable sequence of these fragments in the output should result in ‘Wolf Schwarz’. Other combinations are theoretically possible (‘SchwaWorzlf’), but with a lower probability. The effects of potential privacy attacks (i.e. targeted attacks on the AI model) are also presented, with particular reference being made to the ruling of the European Court of Justice of 19 October 2016 (C-582/14, para. 14), according to which data is only classified as personal if identification by means of the controller or third parties is not subject to a legal prohibition or is not only possible with a disproportionate amount of time, cost and manpower. In our example, it would require far too much effort just to deduce from the stored fragments and weightings that these were determined from the name Wolf Schwarz.

The (further) technical details of the statement would exceed the scope of this report, which is why interested readers are advised to read the discussion paper. However, if one follows the view of the Hamburg Commissioner for Data Protection and Freedom of Information, data subjects' rights would not have to be implemented (in this respect) due to the lack of personal references in AI models. Overall, this view can be seen as very business-friendly, as it avoids many problems in day-to-day practice from the outset.

In contrast, the Bavarian State Office for Data Protection Supervision (BayLDA), for example, is somewhat more cautious. The ‘AI & data protection’ topic page only contains the following statement on this issue:

‘Information on the AI model:

A number of fundamental technical and legal questions arise as to whether an AI model in itself constitutes personal data at all (see above) - if one were to take the view that this is not the case, then information would not have to be provided at all. If this is the case, then the question arises as to how this can technically work in individual cases, as AI models do not store plain text data as in a database, but rather store it (possibly encoded as so-called tokens) by means of probability distributions and mathematically linked in several levels in the vast amounts of internal parameters - and can often only be generated as AI output through specific inputs/prompts.

AI-as-a-Service is even more complex: While most AI models are deterministic in themselves, i.e. the same output is always generated for an input, during operation they are provided with random start values, possibly additional internal states (depending on previous inputs) and different hardware rebindings (for AI-as-a-Service), which sometimes makes it impossible to reproduce inputs/outputs.’

Although the BayLDA also recognises the relevant issues, it is somewhat reluctant to make a clear classification. In our opinion, this statement by the BayLDA at least allows for a differentiated examination of whether or not personal data is contained in an AI model. As this question cannot be assessed from a purely legal perspective, IT managers with the necessary expertise must always be consulted for an internal review.

As far as can be seen, the state commissioner for data protection and freedom of information (LfDI) in Baden-Württemberg takes the (probably) strictest view. In its discussion paper ‘Legal bases for the use of artificial intelligence’ (version 2.0), it states, among other things, that:

‘Such a personal reference could arise, for example, from the fact that the AI model itself contains the personal data. However, indirect identifiability could also be likely outside of the direct storage[14] of personal data. For example, the Hamburg and Danish supervisory authorities argue that large language models (hereinafter: LLM) do not contain any personal data.[15] If this view is followed, the provider would nevertheless have to assess whether third parties can establish a personal reference when (publicly) providing an (LLM) AI model, taking into account the above points.[16] The AI model cannot be considered in isolation here; rather, the AI system or AI application must be considered as a whole. In particular, it must be checked whether third parties or, in this case, the users can receive personal data as output, e.g. with certain input prompts.[17] Real or fictitious[18] statements about persons, which are only made possible by the AI application, should only be attributed to the provider of the AI model if the input prompts can reasonably be expected from him. For example, a distinction should be made between the input prompts "Who is [name]?" and "What is [name], born on [date of birth] in [place of birth], residing in [city], accused of having committed on [date]?". As a result, it can be stated that third parties can establish a personal reference. Whether certain input prompts can be effectively prevented, for example through technical and organizational measures, must be examined on a case-by-case basis. In this context, so-called model attacks must also be taken into account.[19] As an example, so-called membership inference attacks attempt to find out which personal data were in the training data in order to derive characteristics of the natural persons. In so-called model inversion attacks, on the other hand, an attempt is made directly to obtain information about the training data from the model's learning results. If such attacks on AI systems are possible, the model itself could in turn be regarded as personal data.‘

Even if the LfDI Baden-Württemberg does not make a blanket statement about the personal reference of an AI model, but rather takes an overall view consisting of the AI system and the AI model, the statements in the discussion paper see the need for a very thorough examination in each individual case.

In addition, there are also representatives of the supervisory authorities, for example, who compare the tokenization and mathematical concatenation of plain text data in trained AI models with the encryption of personal data, where the encrypted plain data is also not readable (nor can it be made readable with reasonable effort) but does not lose its personal reference through encryption alone. As with the decryption of the data, the training data can also be reproduced in the output through correct prompting, especially with smaller AI models (see also the phenomenon of "overfitting" of AI).

So far as the Data Protection Conference (DSK) states in a press release dated September 2, 2024 with reference to the training of AI that "in the vast majority of cases, processing of personal data cannot be ruled out", this does not make a clear statement about the legal qualification of the respective AI model itself. The fact that AI models are usually trained with personal data and are also (technically) capable of generating personal data should not be further questioned in this case. However, in our opinion, these circumstances say nothing about whether an AI model in its "fully" trained form has a personal reference.

Conclusion: Depending on the case and the AI model used, it is at least possible that the rights of those affected do not have to be implemented (in this respect) due to the lack of personal reference of the respective AI model. The relevant questions must be clarified together with the IT department and documented for accountability purposes (see Art. 5 Para. 2 GDPR). As long as there are no binding statements from the European supervisory authorities or the ECJ, a clearly documented and properly considered decision for one of the two views is the only thing that can be required of companies.

To what extent is there responsibility for the AI model?

If one does not share the view outlined above and wishes to assume that an AI model is personal due to the existing legal uncertainties, it is our opinion that the areas of responsibility must be clearly separated.

The BayLDA also recognises this issue and comments as follows on the aforementioned ‘AI & data protection’ topic page

‘Tip 1: Check the spheres of responsibility (see above). Maybe you as an AI operator are not the right addressee of a request for information on data from the AI model?’

In the above example of the AI system from the cloud, the following areas of responsibility can be separated according to our assessment:

As far as the input into the AI system and the use of the resulting output are concerned, the company is undoubtedly responsible (also for the implementation of data subjects' rights). The same applies to any databases linked to the AI system. To this extent, it must be fully ensured that data subjects' rights can be implemented in accordance with Art. 15 et seq. GDPR can be implemented.
In contrast, as far as the AI model and the (at least theoretically) possible processing of personal data is concerned, the company is not to be regarded as the controller in this respect. On the one hand, it must already be taken into account that (at least active) processing of personal data can only ever be assumed in interaction with the respective prompt. Without an active influence on the AI model, there is no processing of information (including personal data) that is somehow controlled by the company. In a very abstract way, the situation can be compared to other information on the Internet, which can theoretically be accessed at any time, but requires active information processing (e.g. via a search engine). The fact that the AI model in itself can be used at any time and retrieved from the provider's servers does not lead to the processing of personal data, which is the responsibility of the company. This must apply all the more because the respective AI model is generally not provided ‘exclusively’ for a specific company.

In addition, the training process and any options for influencing the AI model used (and hosted on the provider's servers) (e.g. by ‘fine-tuning’ the AI model) are within the sole sphere of influence of the provider. In this respect, the company only has the option of regulating the use of the AI system and preventing the use of undesirable results. In particular, this means that incorrect results from the AI system may not be used and - as far as possible - must be prevented by technical filters. If one also wanted to establish the company's responsibility for the AI model, this would lead to de facto unsolvable tasks for companies using AI.

The German Data Protection Conference (DSK) also takes a position on these issues - albeit very cautiously - and makes the following statement in its guidance document ‘AI and data protection’:

‘The suppression of unwanted outputs by means of downstream filters does not generally constitute erasure within the meaning of Art. 17 GDPR. This is because the data that leads to a certain output after a certain input could still be available for the AI model in a personalised form. However, filter technologies can contribute to avoiding certain outputs and thus serve the rights and freedoms of the persons affected by a particular output.’

Even if the DSK (here too) neither clarifies whether an AI model is personal or not, nor does it turn away from the obligation to implement data subject rights, it does recognise the problem for the company using an AI system. The original obligation to influence the respective AI model - insofar as this is not developed and hosted by the company (as in the present case) - lies with the provider of the AI system, while the company using an AI system can only influence internal processes and should in particular check the use of filter technologies in the output of the AI system.

Conclusion: According to this assessment, there are various arguments that can be used to deny a company's responsibility for an AI model that is used. However, as already mentioned, this categorisation is not unassailable, which is why further developments (in particular statements by the data protection supervisory authorities and decisions by the ECJ) should always be kept in mind.

However, it is important to note that the issues raised here must be assessed differently if a company independently undertakes the process of developing and training AI models. In this case, there is in any case a direct influence on the AI model. It can therefore also be assumed that the company has a corresponding responsibility.

Practical advice

The issues outlined above are extremely complex from a legal and technical perspective. Generalised statements are therefore unsuitable for developing a meaningful strategy for AI compliance within the company. From a company's perspective, however, this article should at least be used as an impetus to deal with the issues and draw up a strategy for implementing data subject rights - especially when using AI. If you have any questions, please contact us directly - we will be happy to support you!