GDPR Article 17 applied to AI stacks after the EDPB's February 2026 CEF report. Three deletability tiers, what unlearning cannot do yet, and a response template.
A customer emails you. "Please delete all my data." Your support team forwards it to engineering. Your engineer pulls up the record and pauses. The name and email appear in support tickets, in the vector store built for the support copilot, in model request logs at three different providers, in a fine-tune your team trained on a year of "anonymised" transcripts that turned out not to be so anonymised. Some of those are easy to delete. Some are impossible. Some your team has never opened.
Article 17 of the GDPR does not have an "AI is hard" exemption. This article walks the AI stack by the three deletability tiers the EDPB's 2025 Coordinated Enforcement Framework report implicitly tracks: what you can delete, what your provider controls, and what no one can provably delete in 2026. Each tier has a different operational answer. Each has a different way of going wrong.
The EDPB adopted the 2025 Coordinated Enforcement Framework report on the right to erasure on 18 February 2026. 32 European DPAs participated. 9 opened or continued formal investigations. 23 did fact-finding exercises. 764 controllers, from SMEs to multinationals, responded to questionnaires. The Board flagged Article 17 for coordinated enforcement because it is one of the most frequently exercised GDPR rights and the one DPAs receive the most complaints about.
The headline finding most coverage picked up is "17 DPAs raised concerns about controllers lacking documented procedures." That is the number the legal blogs quoted. The more consequential finding sits underneath it: the report names two persistent technical weaknesses that explain why the procedures are missing.
First, the absence of systematic internal data classification. Controllers do not know where personal data lives in their systems. Without that map, the deletion path cannot be written down in advance and every erasure request becomes a bespoke investigation.
Second, the lack of automated deletion labels in IT systems. Even where controllers know the data is there, the systems have no tag that tells a deletion pipeline "this row maps to this data subject, delete it." The labels have to be built before the deletion can be automated.
Both weaknesses are acute in AI stacks because the stack multiplies the surfaces. One feature can write personal data to the database, the application logs, the prompt history, the vector store, the retrieval sources, the provider's request logs, the fine-tune dataset, and the fine-tune artefact. If the classification is missing at any of those layers, the erasure pipeline has a hole.
The CEF 2025 report also flagged the use of ineffective anonymisation as a substitute for deletion, the absence of clearly defined retention periods, and technical limitations preventing erasure in backup systems. Reed Smith's [key takeaways piece](https://www.reedsmith.com/our-insights/blogs/viewpoints/102mm9l/edpb-report-on-the-right-to-erasure-key-takeaways-from-the-2025-coordinated-enfo/) is the most usable legal summary of the findings. The DPC and the Maltese IDPC both welcomed the report on 20 February 2026, which is unusually fast for a coordinated response and a signal that national enforcement follow-through is likely.
I think the load-bearing finding is not the 17 DPAs but the systematic-classification gap. Without classification, nothing else is possible. Build the classification first, and the deletion path follows. Skip it, and every erasure request is a multi-day firefight that ends with a partial response and a complaint.
For each Tier 1 surface, the deletion path exists. It may need work to set up, but nothing in the technology is blocking you.
Application logs. Your own server logs of API calls, user actions, error traces. Query by user ID, delete the rows, write the deletion to an audit log. This is the easy case. If your logging stack does not support selective deletion, that is the gap to close, not a regulatory barrier.
Prompt and conversation history. Stored chat sessions, copilot context, agent memory. The same approach as application logs. The gotcha is agent memory: some agent frameworks persist context across sessions in ways that are invisible to the product team. Audit the framework's storage layer.
Vector embeddings and retrieval indices. Pinecone, pgvector, Qdrant, Weaviate. The embeddings themselves are personal data if the source text is (we wrote the longer argument here), so they need to be deleted when the data subject is. The mechanical step is a vector delete by ID. The prerequisite step is a document-to-vector lineage table that maps each vector back to the source record it came from. Without that table, you cannot find the vectors to delete.
Build the document-to-vector lineage table before your first ingestion, not after your first erasure request. The lineage table costs one migration to add. Retrofitting it after the fact means reprocessing the entire vector store and reconciling against a deletion list that may already be incomplete. The CEF report's "automated deletion labels" weakness is exactly this: the label has to exist at ingestion for the automated deletion to work at erasure.
Retrieval source documents. The support tickets, contracts, knowledge-base articles that feed your RAG pipeline. These follow your standard data deletion path. The reminder is that deleting the source does not delete the vectors derived from it; the lineage table handles the cascade.
Cached responses. Response caches for cost or latency reasons. Purge by cache key. If the cache key is tied to user identity or session, this is trivial. If the cache is keyed on prompt content, the keys themselves may include personal data and need to be recomputed or rotated.
All of Tier 1 is deletable per person on request. Write the path down in a runbook before the request arrives.
The second tier is the data your provider holds on their infrastructure as a result of your API calls. You do not control it directly; you request deletion through the provider's process.
Provider request and response logs. OpenAI, Anthropic, Azure OpenAI, and Google Vertex all retain request and response logs at varying durations that depend on your tier, your abuse-monitoring configuration, and your contract. Most default to 30 days for abuse monitoring with longer retention for enterprise tiers unless you opt out. OpenAI publishes a help article on personal data removal from ChatGPT and a separate DSAR process. Anthropic and Google have equivalent deletion workflows in their Trust Centers.
Fine-tune training data held by the provider. When you upload training data to a fine-tune job, the provider keeps a copy. Deleting your local copy does not delete the provider's copy. Submit a deletion request for the training dataset object as well as the fine-tune job itself.
The practical failure mode for Tier 2 is forgetting it exists. Teams remember the database and the vector store, run the deletion, and close the ticket. The provider logs and the fine-tune dataset sit quietly on someone else's infrastructure until an audit finds them. Every erasure runbook needs a named step that contacts each provider in your stack with the specific records or user identifiers to delete, and a place to record the provider's confirmation.
The third tier is the one the article is really about. Base model weights, fine-tuned model weights, and anything the model has truly memorised during training.
Base model weights. The third-party model you call over an API. You do not control these, and the provider does not delete individual data contributions from them on request either. This is out of scope for your deletion path. Document the reasoning and move on.
Fine-tuned model weights. If you or your provider trained a custom model on data that included the requester, you have a harder problem. Deleting the training dataset does not undo the weight updates that the training caused. The only provably clean path is to retrain the model without the requester's data. That is expensive and slow.
Machine unlearning is the research direction trying to solve this without a full retrain. ICLR 2025 produced credible methods: SimNPO (Simplicity Negative Preference Optimization), RMU (Representation Misdirection Unlearning), and LoRA-based unlearning variants. These methods strike a better balance between removing the target knowledge and preserving model utility than earlier gradient-ascent approaches. They are genuine research progress.
They are not yet a compliance tool. A Carnegie Mellon analysis from April 2025 concluded that current unlearning benchmarks are weak measures of progress: they test whether the model can no longer recite the forgotten content, not whether the model no longer "knows" it in a way a regulator would accept. Subsequent work by Tuan-Anh Bui and others has shown that many "unlearned" models can be re-elicited through paraphrase attacks or relearning with small amounts of data. A more recent Systematisation of Knowledge paper on machine unlearning for LLMs from mid-2025 catalogues the same gap.
I am not convinced any current unlearning method will be accepted by a regulator as equivalent to deletion before 2027. The CMU benchmark paper and the SoK paper are the honest reads, and nothing in EDPB, CNIL, Garante, or ICO output since then has blessed an unlearning method as satisfying Article 17. RMU and SimNPO are where to watch if you are tracking the research. They are not where to point a regulator.
Anything the model has memorised. Large language models sometimes memorise training data verbatim. If your fine-tune or your provider's training pipeline memorised the requester, the model can re-emit their information on a prompt that triggers the memorised span. This is the failure mode the Garante cited in the OpenAI case: the model trained on personal data without a proper legal basis, and the output could surface that data. No amount of deletion at the API layer fixes a memorised span in the weights.
For Tier 3, the technical reality that you cannot delete is acceptable to regulators only if you took reasonable steps before the request and you wrote them down. "Reasonable steps" in 2026 looks like this.
This documentation is the artefact the CEF 2025 report was calling for without using the AI-specific framing. If your classification layer knows what surfaces contain personal data, and your automated deletion labels cover Tier 1, and your "reasonable steps" file covers Tier 3, the Tier 2 path to the providers is the last piece and it is the easiest.
GDPR Article 19 obliges you to communicate any erasure to each recipient you disclosed the personal data to, unless that proves impossible or involves disproportionate effort. For AI stacks the recipients are every sub-processor and every downstream system you piped the data into.
This is the step teams skip. The erasure runbook deletes the primary records, handles Tier 2 provider requests, and the team moves on. The CRM that received an enriched export of the support conversation still holds the data. The analytics pipeline that ingested the support transcripts still holds them. The Slack channel where the conversation was shared for on-call debugging still holds them. Article 19 says you notify all of those recipients.
Build the list of downstream recipients at the same time you build the data classification. If you do not know where the data went, you cannot satisfy Article 19, and the CEF report's "systematic classification" finding lands on you twice.
The customer-facing response matters as much as the deletion. Plain language, specific surfaces, no vendor jargon, a named follow-up date for anything you cannot close on day one.
Subject: Your data deletion request
Hi [name],
We received your request to delete your personal data under GDPR
Article 17. This is what we did, what we could not do, and what is
in progress.
Deleted:
- Account record and all associated profile data
- Support tickets and conversation history
- Application logs containing your user ID
- Vector store entries derived from your support tickets
- Cached responses keyed to your account
- Analytics records tied to your account
Deletion in progress at our AI providers:
- [Provider name]: deletion request submitted on [date], expected
completion [date]
- [Provider name]: deletion request submitted on [date], expected
completion [date]
Not deleted, with reasons:
- A fine-tuned model used by our support assistant. We did not
include identifying personal data when we trained this model, so
the model does not hold information about you specifically. We
verified this on [date] by running sample queries against the
model and reviewing the outputs. We are happy to share the
evaluation on request.
- Anonymised analytics aggregated across all users, which do not
identify you.
We also notified the following recipients of your erasure request,
as GDPR Article 19 requires:
- [Recipient 1]
- [Recipient 2]
You have the right to lodge a complaint with your national data
protection authority if you believe our response is incomplete.
Best,
[name]
[contact]
Phased responses are acceptable as long as you name the phases and the dates. Silence is not. If you used phased deletion, the one-month Article 12(3) deadline still applies to the initial response, not the completion.
The next request will arrive. It may come from a customer today, or from a DPA tomorrow, or from an employee who just left the company. The setup work is the same in all three cases.
Write the nine-surface inventory for your actual stack. Map each surface to Tier 1, Tier 2, or Tier 3. For Tier 1, write the deletion path and test it. For Tier 2, write the provider contact and the identifier format each provider accepts. For Tier 3, write the reasonable-steps file: what personal data you excluded from training, why you are not relying on unlearning, what the retraining path looks like, and the evaluation that shows the model does not output identifying information about any data subject.
Add the document-to-vector lineage table if you do not have one. Add the downstream recipients list for Article 19. Add the response template. Walk a junior engineer through it in under thirty minutes. If they cannot execute it without asking for help, the runbook is not ready.
Pseudonymisation is not anonymisation. The CEF 2025 report explicitly flagged ineffective anonymisation as one of the recurring failure modes, and it has been [settled law since Recital 26](https://gdpr-info.eu/recitals/no-26/) that pseudonymised data remains personal data under GDPR. Replacing names with tokens does not exempt the records from the erasure obligation. If your team told the customer "we anonymised your data, so Article 17 does not apply," you are in the failure mode the report named. The fix is to stop calling it anonymisation unless it meets the [EDPB Opinion 28/2024 negligible-likelihood test](/articles/are-vector-embeddings-personal-data) and to treat pseudonymised data as personal data for Article 17 purposes.
Article 17 does not have an AI exemption, and the EDPB's 2025 coordinated enforcement is the proof that the authorities are looking at the gap between what AI teams can actually delete and what they claim to delete. The defensible 2026 position is a three-tier map: delete Tier 1 on a defined path, request deletion at the providers for Tier 2, and hold the reasonable-steps file for Tier 3. Machine unlearning is watching, not compliance. Build the classification and the lineage table before the next request, not during it.
Vector embeddings of personal data are likely personal data under GDPR. Here is the legal test, the 2025 attack research, the regulator convergence, and how to document your position.
A practical guide to building RAG systems with customer data while handling GDPR obligations. Lineage tables, retrieval authorization, embedding inversion, and erasure planning.
An operational guide for AI data leaks. GDPR Article 33 timing, containment, evidence preservation, notification templates, three worked incident walkthroughs, and the regulator differences that catch teams off guard.
Free tool · live
AI Data Flow Checker
Map how personal data flows through your AI integrations and spot the privacy risks before they spot you.