Third-party AI integration is not one shape. It is four. The four patterns, the data flows they create, and the check that fits each.
There is a habit, in privacy writing, of treating "third-party AI integration" as if it were a single thing. The phrase actually covers at least four distinct technical shapes, each with its own data flow, its own sub-processor question, and its own failure mode. Treating them as the same is the reason most teams miss the one that bites them.
This article walks the four patterns. Each comes with its own check.
This is the pattern most teams underestimate. You did not enable an integration. You did not click anything. The vendor of a tool you already pay for added a model to its existing service, and the default for your tenant was "on".
The cleanest 2026 example is Microsoft 365 Copilot. On 8 December 2025, a new admin toggle for "AI providers operating as Microsoft subprocessors" appeared in the Microsoft 365 admin centre with Anthropic listed underneath. On 7 January 2026, that toggle activated. The legacy mechanism, under which tenant admins had to opt in to Anthropic under a separate Anthropic commercial agreement and DPA, was deprecated. From that date, Claude operates under the Microsoft Product Terms and the Microsoft Data Protection Addendum, not Anthropic's own.
The geographic logic of the rollout is the part most write-ups miss. For most commercial cloud tenants, Anthropic was on by default. For tenants inside the EU Data Boundary, the European Free Trade Association, and the United Kingdom, the toggle exists but defaults to off, because Anthropic models are currently excluded from the EU Data Boundary and applicable in-country processing commitments. Government clouds (GCC, GCC High, DoD) and other sovereign clouds get no toggle at all, because Anthropic has no FedRAMP certification yet.
The scope of the activation is broad: Microsoft 365 Copilot (web, desktop, mobile), Researcher, Copilot Studio, Power Platform, Agent Mode in Excel, and the Word, Excel, and PowerPoint agents. Microsoft Learn confirms full availability is expected by the end of March 2026.
For a US-headquartered company with EU subsidiaries, the implication is awkward. The parent tenant has Claude on by default. The subsidiary tenant has it off by default. If the two share documents through a single Copilot session, the question of which model processes the request is determined by the tenant routing the session, which is rarely the question the legal team asked when they reviewed M365 Copilot last summer.
The check that fits this pattern: subscribe to your largest SaaS vendors' admin-centre changelogs and trust portals, not the marketing blog. Microsoft's, Google Workspace's, Salesforce's, Atlassian's, Slack's. When a new sub-processor row appears, treat the appearance itself as the trigger for a DPA review, regardless of whether the toggle is on or off in your region today. Regions move.
This is the pattern that breaches. The integration is not a feature inside an existing tool; it is a separate AI product that you authorise to reach into a system of record on your behalf via an OAuth grant.
Between 8 and 18 August 2025, the UNC6395 threat cluster (also tracked as GRUB1) systematically queried more than 700 Salesforce environments by impersonating Salesloft's Drift chatbot integration. The attackers did not exploit a Salesforce vulnerability. They used valid OAuth and refresh tokens, credentials issued through a normal authorisation flow, to bypass MFA, move laterally through customer Salesforce instances, and export business records over a ten-day window. Cloudflare, Google, PagerDuty, Palo Alto Networks, Proofpoint, SpyCloud, Tanium, and Zscaler all confirmed exposure. On 20 August 2025, Salesloft and Salesforce revoked all Drift OAuth tokens; Salesforce removed the Drift application from the AppExchange the same week.
The attackers were not after the support tickets themselves. They were combing case text and contact records for embedded secrets: AWS access keys, Snowflake tokens, VPN credentials, plain-text passwords pasted into "please help me debug this" messages. The AI integration was not the prize. It was the lateral movement vector that gave them read access to a corpus full of secrets nobody had ever audited.
Two facts make the OAuth pattern uniquely fragile. First, the OAuth scopes a typical chatbot or summariser asks for are wide (read across all objects, sometimes write) because the vendor wants the integration to be useful out of the box. Second, the refresh tokens often outlast the user who authorised them. When the employee leaves, the human SSO session ends; the OAuth grant on a sub-processor's side keeps working until somebody manually revokes it.
The Verizon 2025 Data Breach Investigations Report found that third-party involvement in breaches doubled to 30 percent year over year, across more than 12,000 confirmed breaches in the dataset. The Salesloft incident is the cleanest single case in that statistic.
The check that fits this pattern: when an AI service requests OAuth scopes against a system of record (Salesforce, Microsoft 365, Google Workspace, GitHub, Jira, Slack), audit the granted scopes against what the integration actually needs, set the shortest token lifetime the vendor supports, schedule a quarterly OAuth-grant review against the SSO directory, and revoke any grant whose human authoriser has left. The AI integration register is the deliverable: every OAuth grant, the scopes it holds, the human who authorised it, the SSO status of that human, and the last time it was reviewed.
This is the pattern where the vendor's own tool sends your data to two, three, or four different external model providers, and the providers themselves run on hyperscaler infrastructure that adds another layer underneath.
Notion AI is a clean example. The Notion AI security and privacy page lists OpenAI, Anthropic, and Google (added in January 2026 with Notion 3.2) as the three model providers in rotation. Beneath them sits Turbopuffer for vector storage, plus any external tool the user has connected via MCP. The available models in early 2026 are Claude Opus 4.5, GPT-5.2, and Gemini 3, depending on which task the user invokes. That is at least four external parties processing your workspace data before any MCP connection extends the chain further.
The retention story varies by plan and is the part most teams skip when they enable Notion AI. On Notion's Enterprise plan, the LLM providers operate with zero data retention: nothing is stored on the provider side. On every non-Enterprise plan (Free, Plus, Business), the providers may retain customer data for up to 30 days before deletion. Embeddings via OpenAI carry no provider-side retention either. Notion contractually requires every sub-processor to agree not to train on customer data, which is stronger than the default for many SaaS-AI integrations.
The cascade gets more interesting when you trace each of the three model providers down a layer. OpenAI calls run on Microsoft Azure (and increasingly OpenAI's own GPU footprint). Anthropic calls run on Amazon Bedrock and, since the 23 October 2025 expansion, Google Cloud TPUs across three chip platforms. Google's Gemini calls run on Google Cloud's own TPU and GPU fleet. A single Notion AI prompt about a customer record can, depending on which model the system selects, traverse a different US-based hyperscaler each time it is run. The CLOUD Act exposure is determined not by where Notion stores the workspace but by which model the routing layer picked for that specific call.
The legal anchor is GDPR Article 28(2). A processor cannot engage another processor for processing on the controller's behalf without prior authorisation. Where the controller has granted "general written authorisation", the processor must inform the controller of any intended changes, and the controller must have the opportunity to object. In the Notion AI case, the addition of Google as a third model provider in January 2026 was a sub-processor change under Article 28(2). The notification is the trigger. The 30-day default objection window is the clock.
The check that fits this pattern: read the vendor's AI-specific sub-processor list (not the general one), confirm your DPA covers each model provider by name or by general authorisation language that flows down to AI sub-processors, verify what plan tier you are on (because retention defaults differ by tier), and configure your sub-processor change notification to land in a monitored inbox, not a distribution list nobody reads. For Notion specifically, the Enterprise plan is the one where the zero-retention promise actually holds.
This is the pattern that gets the least attention because it is the safest. The vendor of a tool you already use builds an AI feature, runs it on the vendor's existing infrastructure, and never sends your data to an external model provider. The AI vendor surface is, structurally, just an extension of the vendor surface you already trusted.
Slack AI is the cleanest example in 2026. Slack runs closed-source large language models inside an "escrow VPC" on the Slack-controlled portion of AWS. The model providers themselves have no inbound access to that VPC and no path to inspect or retain prompts. Slack uses retrieval-augmented generation rather than fine-tuning: the LLM is supplied with only the content needed for a single task, and no Slack customer data is used to train any underlying model unless the customer affirmatively opts in. The architectural commitment is that data does not leave Slack's existing trust boundary, period.
This pattern is structurally important for two reasons. First, it dramatically reduces the sub-processor surface: your data is not flowing to OpenAI or Anthropic directly, only to the vendor whose DPA you already signed. Second, it means that when the vendor's underlying AWS region is the EU region, EU residency is preserved through the AI feature without any new transfer mechanism work.
But it does not eliminate the check. The pattern still creates new processing inside the vendor's infrastructure, and that processing has its own retention windows, its own logs, its own prompt/response storage, and its own potential to surface content the user technically can access but never would have found manually. The over-sharing problem is real even when the trust boundary is preserved: a Slack AI summary that surfaces every channel a user has read access to can, in a permissively-configured workspace, expose documents the user never would have opened by hand. Microsoft itself has documented this dynamic for M365 Copilot, where the tooling surfaces files the user has access to under inherited permissions even when nobody intended that access to be discoverable.
The check that fits this pattern: confirm in the vendor's engineering or trust documentation that the AI feature actually stays inside the vendor's existing infrastructure (and is not silently routed to a third-party model provider on a sub-set of requests), tighten file and channel permissions before enabling the feature, and treat the AI feature's logging and prompt-storage retention as a new surface that needs its own retention setting in your policy. The trust boundary holds. The over-sharing problem does not.
Across all four patterns, the underlying legal anchor is the same. Your relationship with the SaaS vendor is a controller-processor relationship. Their relationship with the model provider is a processor-sub-processor relationship. Article 28(2) requires either specific or general authorisation for that downstream engagement, and gives you (the controller) the opportunity to object before a new sub-processor processes your data.
The four patterns differ in how visible the change is:
| Pattern | What triggers the data flow change | How visible is it |
|---|---|---|
| 1. Default flip inside an existing tool | Vendor activates a new sub-processor toggle by default | Email plus admin centre banner; easy to miss if nobody reads vendor changelogs |
| 2. OAuth-connected AI service | An employee installs the AI app and approves the OAuth scopes | Visible in SSO logs at install time, then invisible until something breaks |
| 3. Multi-provider cascade inside a tool | Vendor adds a new model provider to its provider rotation | Sub-processor list update; only visible if you subscribe to it |
| 4. AI inside the vendor's trust boundary | Vendor enables a new AI feature on existing infrastructure | Most visible: usually accompanied by a press release, no new sub-processor row |
Patterns 1 and 3 are the ones the EDPB Opinion 28/2024 framing on AI model anonymity matters for: when a vendor sends your workspace text or customer records to an external model, the resulting prompts and embeddings are rarely anonymous, and your controller obligations persist through the entire chain. Pattern 2 is the one that fails operationally before it fails legally; the breach happens before the audit catches the OAuth scope. Pattern 4 fails through over-sharing rather than through external transfer.
The DPA you signed with each vendor needs to do four things to cover all four patterns: identify the AI sub-processors by name or by category, specify the retention defaults by plan tier, commit to a meaningful (not 24-hour) objection window for sub-processor changes, and explicitly state that customer data is not used to train any model. Most pre-2024 SaaS DPAs do none of this. The vendor's updated DPA (almost every major vendor has published one by now) usually does, but the update may not flow back into your contract automatically. You may need to sign an amendment, and the practical answer is to bundle all four checks into the next contract renewal cycle rather than chase them individually.
The checklist is shorter than it looks, because three of the four checks are the same shape across all four patterns. The differences are in what evidence you collect.
Open the admin centre of your most-used SaaS tool right now. Scroll to the page where new AI features and sub-processors are listed. Note the date the most recent change appeared. If it is older than your last DPA review, you are caught up. If it is newer, the four checks above are the next conversation.
The pattern you find usually tells you the next step on its own. If the change is a default flip (Pattern 1), the work is the DPA amendment and the privacy notice update. If the change is a new OAuth grant (Pattern 2), the work is the scope audit and the token-rotation policy. If the change is a new model provider in the vendor's cascade (Pattern 3), the work is the sub-processor objection window. If the change is a new vendor-internal AI feature (Pattern 4), the work is the over-sharing audit and the retention setting.
The four-pattern question is the one that turns "third-party AI integration" from a vague risk into something you can actually check.
A trace-walk of one OpenAI API call through every entity in the cascade, with the Article 28, CLOUD Act, Article 48, and DMA layers stacked on top.
Generic AI vendor checklists fail because they treat every provider as one category. The right questions depend on which of four vendor archetypes you are evaluating.
Three tiers of shadow AI in 2026: the browser tab, the in-SaaS toggle, the OAuth-scoped agent. IBM puts the breach delta at $670K, Article 4 enforcement starts 2 August 2026, and a register beats a ban.
Free tool · live
AI Data Flow Checker
Map how personal data flows through your AI integrations and spot the privacy risks before they spot you.