GDPR + AIApril 8, 2026· 11 min read

A customer asks 'what does your AI know about me?' How to answer an Article 15 request when you ship LLM features

GDPR Article 15 for AI stacks after CEF 2024 and CJEU C-203/22. The copy, the explanation, the sub-processor list, and the one-month clock.

TLDR

Article 15 is easier to underestimate than Article 17. The copy is the easy half; the explanation is the hard half, and the clock is one month with no "we need more time to query our vector store" exemption. Article 12(5) allows a single one-month extension only where the request is complex and you tell the requester within the original month.
The EDPB's 2024 Coordinated Enforcement Framework report on the right of access surveyed 1,185 controllers across 30 DPAs and identified seven specific challenges, including the absence of documented internal procedures. The gap that hit small and mid-sized teams hardest was the absence of a runbook for handling DSARs across all the surfaces personal data actually lives on.
After CJEU C-203/22 (27 February 2025), "meaningful information about the logic" means a concise intelligible explanation of the actual procedures and principles the system used, not a mathematical formula and not a refusal on trade-secret grounds. For LLMs this lands harder than for credit models because an LLM cannot be reduced to a factor list.
The single most frequently skipped obligation is Article 15(1)(c), the recipients of the personal data. For AI features that means listing every AI sub-processor in your stack with the country they operate in.
Build the DSAR runbook beside your erasure runbook. The surface inventory is shared. The action is retrieval, the format is intelligibility, and the explanation is the hard part.

A customer emails you. "Send me everything you have on me." This is different from "delete everything." You cannot solve it by clicking a delete button. You have to assemble the personal data from every surface it has touched, format it intelligibly for a non-technical reader, and explain what your AI feature did with it. The copy is the easy half. The explanation is the hard half.

The EDPB's 2024 Coordinated Enforcement Framework report on the right of access, adopted on 20 January 2025, surveyed 1,185 controllers across 30 European DPAs. The report named seven specific challenges, and the one that matters most for AI teams is the absence of documented internal procedures for handling access requests. That finding is what the 2025 CEF report on the right to erasure confirmed a year later for a different GDPR right. The pattern is consistent. Teams improvise, regulators notice, and the fix is a runbook written in advance.

This article walks the four obligations of Article 15 and the specific failure modes each one produces in AI stacks. If you have already read the right-to-erasure article, the surface inventory you built there is the same one you need here. The action is different, and the explanation is where the genuinely hard part sits.

What the 2024 CEF report named as the gap for AI teams

The CEF 2024 action ran through 2024. 30 DPAs participated. 1,185 controllers responded, ranging from SMEs to multinationals. The final report adopted 20 January 2025 identified seven challenges. The three that land hardest on AI stacks:

Lack of documented internal procedures. The same problem the 2025 CEF report flagged for erasure. If your team has never handled a DSAR and there is no runbook, the first request will arrive on a day when the engineer who knows the vector store is on leave, and the one-month clock will not care.

Insufficient awareness of EDPB Guidelines 01/2022. The EDPB's right-of-access guidelines are the operational reference. They cover identity verification, the limits of the "manifestly excessive" exemption, the intelligibility standard for the response, and the handling of third-party data inside the response. Teams that have not read them tend to fail at identity verification (asking for too much) and third-party redaction (not doing enough).

Compliance lower in smaller organisations. The CEF report noted higher compliance in larger organisations and those that handle a significant volume of requests. Small teams with infrequent DSARs are exactly the teams most likely to improvise on the first request, and AI features are often shipped by small teams. If your company is under 50 people and you ship an AI copilot over customer data, the CEF pattern predicts you are in the unprepared bucket.

Eleven supervisory authorities assessed compliance as "high" and one rated it "very high," but seven rated it only "average." The gap between the top-rated and the average-rated controllers in the CEF sample was mostly about having a runbook that covered the non-obvious surfaces.

Note

On 10 April 2025 the EDPB also published AI Privacy Risks and Mitigations for Large Language Models, a commissioned report by Isabel Barbera. It is not an official EDPB position (it came through the support pool of experts), but it includes three worked LLM use cases (a customer service chatbot, a student learning assistant, and a travel planning tool) that map cleanly onto the DSAR surfaces most teams actually have to handle. Read the customer service chatbot case if you ship a support copilot.

Returning a copy in intelligible form

Article 15(3) requires the controller to "provide a copy of the personal data undergoing processing." The EDPB's Guidelines 01/2022 interpret "copy" as "the personal data in an intelligible form," and intelligible means readable by the data subject, not by an engineer with access to the storage backend.

Walk the AI surface inventory from your erasure runbook. The action is retrieval, the format is intelligibility.

Surface	What to return	Format
Application logs	Records keyed to the user	JSON or CSV with labelled fields
Provider request and response logs	Logs from each AI provider the data touched	Provider DSAR export (OpenAI: `dsar@openai.com` or the privacy portal; Anthropic and Google have equivalents)
Prompt and conversation history	Full history per user	Markdown or per-conversation file export
Retrieval source documents	Source documents that contain the person	Original file types
Vector embeddings	Reference table, not raw vectors	Table mapping vector IDs to source documents with timestamps
Fine-tune training data	Records that were included	Original format of the training file
Retrieval logs	Times the person's data was retrieved and the output context	Table
Cached responses	Responses keyed to the person	Original response format

Two things about vectors. You do not need to return the raw floats. Embeddings are not intelligible to a human reader, and the Guidelines 01/2022 position on intelligibility is clear. Return the lineage table that maps each vector back to its source document, plus the source documents themselves. But you must account for them: saying "we have no embeddings of you" when your vector store has 3,000 chunks tagged to the account is a substantive failure, not a formatting choice. (Embeddings derived from personal data are personal data, and the EDPB's "negligible likelihood" anonymity test from Opinion 28/2024 is the bar you have to clear before you can treat them as out of scope.)

For provider logs, follow the provider's DSAR process and include the provider's response in what you send the customer. Track the request, attach the confirmation to your DSAR file.

"Meaningful information about the logic" after C-203/22

Article 15(1)(h) is the AI-specific clause. If you make decisions under Article 22 (decisions based solely on automated processing that produce legal or similarly significant effects), you owe the data subject "meaningful information about the logic involved" plus the significance and envisaged consequences.

For years the question was how much detail "meaningful" meant. On 27 February 2025 the CJEU answered in Case C-203/22, Dun & Bradstreet Austria. A customer ("CK") was denied a phone contract after a D&B credit score. CK asked for the logic. The Court rejected two extremes.

You cannot satisfy the obligation by handing over a mathematical formula or a step-by-step algorithmic description. The Court held that complex formulas are not "meaningful information" for the data subject. You also cannot refuse to explain on trade-secret grounds. The Court held that Austrian legislation which categorically excluded access where trade secrets were at stake was contrary to the GDPR. Trade secrets can be balanced against the data subject's right in the form of the explanation, but they cannot trump it.

The standard the Court set is a "sufficiently concise and intelligible explanation" of "the actual procedures and principles used" in the decision, in a form that enables the data subject to exercise their Article 22(3) rights (the right to human intervention, the right to express a view, the right to contest).

For LLMs this lands harder than for credit scoring models. A credit model can be reduced to a factor list with weights. An LLM cannot, because a transformer's output is produced by billions of attention computations over the entire context. I think the practical answer is to describe what the model saw, what it produced, and what role the output played in the decision, rather than to attempt a white paper on transformer attention. The four things a regulator will want to see:

The inputs. What data entered the prompt, including retrieved context from vector stores or databases.
The model's role. Whether the model's output was the decision, a ranking that fed a decision, or a suggestion that a human considered.
The main factors that drove this specific output. For a retrieval-augmented system, this is the list of retrieved documents that were most relevant. For a classifier, this is the feature importance. For a raw LLM answer, this is the retrieval context plus any system prompt that steered the answer.
The consequences for the data subject. What happened to them as a result.

If you cannot describe these four things in plain language for one customer, you cannot describe them for a regulator. And the regulator is the one who gets to decide whether your explanation meets the C-203/22 standard.

Article 15(1)(c): the sub-processor list most teams forget

Article 15(1)(c) requires you to disclose the recipients or categories of recipients to whom the personal data have been or will be disclosed, in particular recipients in third countries. For an AI feature that means naming every AI sub-processor in your stack.

OpenAI, Anthropic, Google Vertex, Azure OpenAI, Pinecone, Weaviate, the observability tool that logs prompts for debugging. Every one of them is a recipient under Article 15(1)(c). Every one of them needs to be named in the DSAR response with the country they operate in, because cross-border transfers sit underneath the same clause. The right-to-erasure article walked the AI sub-processor cascade in more detail; for DSAR purposes the same list becomes a disclosure obligation rather than a contract review item.

Many DSAR responses skip this entirely. The engineer assembling the response thinks of "recipients" as "other humans we emailed the data to." The regulator reads Article 15(1)(c) as "every entity that processed the data under your instructions." Missing the sub-processor list is a common cause of complaints to the DPA, and the complaint is usually easy to substantiate: the complainant knows the company uses ChatGPT, the DSAR response did not mention OpenAI, the DPA now has a clean case.

Third-party data in prompts and the redaction problem

The request is from one person, about one person, but the data may contain other people. Prompt history often includes screenshots of group chats. Support tickets may mention employees or other customers by name. Retrieval contexts may pull in documents that reference third parties. The DSAR response cannot expose any of that.

The EDPB Guidelines 01/2022 are explicit: the right of access for one data subject cannot become a privacy breach for another. You must redact third-party personal data from the response unless you have a basis to disclose it (for example, the third party is named in a contract the data subject is a party to and the disclosure is genuinely necessary for the data subject to understand the processing).

I am not convinced most DSAR runbooks handle this cleanly. The 2024 CEF report did not name third-party redaction as a standalone challenge, which means most teams are not explicitly trained on it, but the Guidelines position is clear and enforcement can follow. A DSAR that leaks one customer's data to another is two GDPR violations, not one.

The operational fix is to run every DSAR export through a redaction pass before sending. For structured data this is mechanical. For prompt history and support tickets it is harder because the third-party names are embedded in free text. A lightweight NER pipeline or a targeted LLM call ("redact names and email addresses that are not 'Customer Name'") handles most cases, and a human review pass catches the rest.

The one-month clock and when phased responses are OK

Article 12(3) gives you one month from receipt of the request to respond. The period runs from the day the request lands, not from when your team gets around to logging it. A single one-month extension is permitted under Article 12(3) where the request is complex or numerous, but only if you tell the requester within the original month that you are extending. You cannot extend silently.

Phased responses are allowed and often sensible. If the provider DSAR turnaround at OpenAI is three weeks and you can respond on everything else in one week, send the first response with what you have and name the date for the second response. The CJEU has held that a partial response within the deadline is better than a complete response after it, as long as the requester knows what is still outstanding and when it will arrive.

What is not allowed is silence past day 30 with no extension notice. If you realise on day 25 that the response will not be ready, send the extension notice that same day. The notice itself is a short email that says you are invoking the complex-request extension under Article 12(3), that the new deadline is day 60, and why. The extension is yours by right if the request is genuinely complex. You just have to use it.

The response template

The response goes to a person, not a database. Use plain language. Group the data by surface. Explain what each surface is.

Subject: Your data access request

Hi [name],

Here is the personal data we hold on you, how we process it, and the
other information GDPR Article 15 asks us to provide.

## Personal data we hold

### Account
[Profile fields, dates.]

### Support and product interactions
[Per-conversation export or attached file. Third-party names redacted.]

### AI assistant logs
[Times you interacted with the assistant, the inputs you provided,
the responses generated. Attached as a file. Third-party names redacted.]

### Documents in our search index that mention you
[List of source documents with dates. Originals attached where the
document is about you; where a document mentions you alongside other
people, we have extracted the passages that concern you and redacted
the rest.]

### Information held by our AI service providers
- [Provider name]: data held, retention, link to provider's own DSAR
  confirmation

## How we process your data

- Purposes: [list]
- Categories of personal data: [list]
- Recipients (Article 15(1)(c)): our AI sub-processors are
  - [Provider name, country]
  - [Provider name, country]
- Retention: [period or criteria]
- Source of the data where not collected from you: [if applicable]

## Automated decision-making (Article 15(1)(h))

[Either "We do not make automated decisions about you under
Article 22." OR a meaningful explanation per the C-203/22 standard:
inputs the model used, the model's role in the decision, the main
factors that drove the specific output, and the consequences for you.]

## Your other rights
You have the right to rectification, erasure, restriction of
processing, and objection. You can lodge a complaint with your
national data protection authority.

Best,
[name]
[contact]

If a surface needs more time (provider DSAR turnaround, redaction review), send a phased response and name the follow-up date. If the request is complex enough to need the full Article 12(3) extension, send the extension notice before day 30.

Watch out

Verification overreach is itself a GDPR violation. Asking for a passport scan when a logged-in account confirmation would do is what Guidelines 01/2022 calls excessive, and the EDPB's position is that the controller should ask for the minimum needed to verify identity. Public-sector defaults are different from private-sector ones; for private-sector SaaS, the default should be in-product confirmation through an authenticated session, not document upload. Teams that ask for a passport on every DSAR are creating complaint risk, not avoiding it.

Before your next access request

Open the erasure runbook you wrote after the right-to-erasure article. Add a DSAR runbook beside it. The surface inventory is shared. For every surface, write the retrieval method, the format, and the intelligibility check.

For the explanation section, write the boilerplate sentences that describe the model's role in each decision the system can make. These are reusable across requests. You only need to draft them once.

For the Article 15(1)(c) sub-processor list, maintain it as a single file you can paste into every response. Update it when the list changes. The sub-processor cascade article has a worked example of what the list should actually contain.

For the redaction pass, write the script or the prompt. The first time a DSAR includes a prompt history with third-party names is the wrong time to figure out how to redact at scale.

Key takeaway

Article 15 is easier to underestimate than Article 17 because the default engineer framing is "it's just an export." The copy is the easy half. The explanation under Article 15(1)(h) is the hard half, and after C-203/22 the bar is a concise intelligible description of what the model saw, what it produced, and what role it played in the decision. The Article 15(1)(c) sub-processor list is the single most frequently skipped obligation and the one that makes DSAR complaints easy to substantiate. The one-month clock does not pause while you query your vector store. Build the runbook beside your erasure runbook, test it on a dry run, and the first real request is a procedure rather than a firefight.

Tagged

DSAR Training data Sub-processors

Continue reading

GDPR + AIApr 8, 2026

Right to erasure when your AI used the data: what's actually deletable in 2026

GDPR Article 17 applied to AI stacks after the EDPB's February 2026 CEF report. Three deletability tiers, what unlearning cannot do yet, and a response template.

11 min read

GDPR + AIApr 6, 2026

Are vector embeddings personal data under GDPR? A technical answer for RAG builders

Vector embeddings of personal data are likely personal data under GDPR. Here is the legal test, the 2025 attack research, the regulator convergence, and how to document your position.

8 min read

AI PrivacyApr 9, 2026

When you call OpenAI, who actually processes your data? The AI sub-processor cascade

A trace-walk of one OpenAI API call through every entity in the cascade, with the Article 28, CLOUD Act, Article 48, and DMA layers stacked on top.

13 min read

Free tool · live

AI Data Flow Checker

Map how personal data flows through your AI integrations and spot the privacy risks before they spot you.