AI PrivacyMarch 24, 2026· 13 min read

Third-party AI integrations: what to check before you connect

Third-party AI integration is not one shape. It is four. The four patterns, the data flows they create, and the check that fits each.

TLDR

Third-party AI integration covers four distinct technical shapes: a feature that flips on inside a tool you already use, a service that connects via OAuth, an AI feature that adds a new sub-processor cascade, and an AI feature that stays inside the vendor's existing trust boundary. Each one creates a different data flow and fails in a different way.
The most dangerous pattern in 2026 is the first one. On 7 January 2026, Anthropic became a Microsoft sub-processor and Claude was switched on by default in Microsoft 365 Copilot for most commercial tenants worldwide. EU, EFTA, and UK tenants were exempted off by default. No customer action was required for the data flow to change.
The OAuth pattern is the one that breaches. Between 8 and 18 August 2025, the UNC6395 cluster used stolen Salesloft Drift OAuth refresh tokens to query Salesforce environments at over 700 organisations including Cloudflare, Google, PagerDuty, Palo Alto Networks, Proofpoint, and Zscaler. The attackers combed support cases for AWS keys, Snowflake tokens, and VPN credentials.
Article 28(2) is the legal anchor for all four patterns. A SaaS vendor cannot engage a new sub-processor for your data without your prior authorisation, and "general authorisation" only works if you actually receive the change notification and have time to object. The default 30-day objection window is shorter than the time it takes to evaluate a new AI sub-processor honestly.

There is a habit, in privacy writing, of treating "third-party AI integration" as if it were a single thing. The phrase actually covers at least four distinct technical shapes, each with its own data flow, its own sub-processor question, and its own failure mode. Treating them as the same is the reason most teams miss the one that bites them.

This article walks the four patterns. Each comes with its own check.

Pattern 1: AI flips on inside a tool you already use

This is the pattern most teams underestimate. You did not enable an integration. You did not click anything. The vendor of a tool you already pay for added a model to its existing service, and the default for your tenant was "on".

The cleanest 2026 example is Microsoft 365 Copilot. On 8 December 2025, a new admin toggle for "AI providers operating as Microsoft subprocessors" appeared in the Microsoft 365 admin centre with Anthropic listed underneath. On 7 January 2026, that toggle activated. The legacy mechanism, under which tenant admins had to opt in to Anthropic under a separate Anthropic commercial agreement and DPA, was deprecated. From that date, Claude operates under the Microsoft Product Terms and the Microsoft Data Protection Addendum, not Anthropic's own.

The geographic logic of the rollout is the part most write-ups miss. For most commercial cloud tenants, Anthropic was on by default. For tenants inside the EU Data Boundary, the European Free Trade Association, and the United Kingdom, the toggle exists but defaults to off, because Anthropic models are currently excluded from the EU Data Boundary and applicable in-country processing commitments. Government clouds (GCC, GCC High, DoD) and other sovereign clouds get no toggle at all, because Anthropic has no FedRAMP certification yet.

The scope of the activation is broad: Microsoft 365 Copilot (web, desktop, mobile), Researcher, Copilot Studio, Power Platform, Agent Mode in Excel, and the Word, Excel, and PowerPoint agents. Microsoft Learn confirms full availability is expected by the end of March 2026.

For a US-headquartered company with EU subsidiaries, the implication is awkward. The parent tenant has Claude on by default. The subsidiary tenant has it off by default. If the two share documents through a single Copilot session, the question of which model processes the request is determined by the tenant routing the session, which is rarely the question the legal team asked when they reviewed M365 Copilot last summer.

The default-flip is not the same as the legacy opt-in. If your team opted in to Anthropic in M365 Copilot under the legacy toggle in 2025, that consent was based on Anthropic's separate commercial terms. The 7 January 2026 transition moved Anthropic underneath Microsoft's DPA: same model, different contract chain, different liability allocation. Re-read your M365 Copilot DPA and confirm whether your privacy notice still accurately reflects which entities process your data. Microsoft's own documentation flags this for tenants in EU/EFTA/UK regions: even if you previously opted in under the legacy toggle, the new toggle defaults to off and you have to opt in again.

The check that fits this pattern: subscribe to your largest SaaS vendors' admin-centre changelogs and trust portals, not the marketing blog. Microsoft's, Google Workspace's, Salesforce's, Atlassian's, Slack's. When a new sub-processor row appears, treat the appearance itself as the trigger for a DPA review, regardless of whether the toggle is on or off in your region today. Regions move.

Pattern 2: AI service connects via OAuth

This is the pattern that breaches. The integration is not a feature inside an existing tool; it is a separate AI product that you authorise to reach into a system of record on your behalf via an OAuth grant.

Between 8 and 18 August 2025, the UNC6395 threat cluster (also tracked as GRUB1) systematically queried more than 700 Salesforce environments by impersonating Salesloft's Drift chatbot integration. The attackers did not exploit a Salesforce vulnerability. They used valid OAuth and refresh tokens, credentials issued through a normal authorisation flow, to bypass MFA, move laterally through customer Salesforce instances, and export business records over a ten-day window. Cloudflare, Google, PagerDuty, Palo Alto Networks, Proofpoint, SpyCloud, Tanium, and Zscaler all confirmed exposure. On 20 August 2025, Salesloft and Salesforce revoked all Drift OAuth tokens; Salesforce removed the Drift application from the AppExchange the same week.

The attackers were not after the support tickets themselves. They were combing case text and contact records for embedded secrets: AWS access keys, Snowflake tokens, VPN credentials, plain-text passwords pasted into "please help me debug this" messages. The AI integration was not the prize. It was the lateral movement vector that gave them read access to a corpus full of secrets nobody had ever audited.

Two facts make the OAuth pattern uniquely fragile. First, the OAuth scopes a typical chatbot or summariser asks for are wide (read across all objects, sometimes write) because the vendor wants the integration to be useful out of the box. Second, the refresh tokens often outlast the user who authorised them. When the employee leaves, the human SSO session ends; the OAuth grant on a sub-processor's side keeps working until somebody manually revokes it.

The Verizon 2025 Data Breach Investigations Report found that third-party involvement in breaches doubled to 30 percent year over year, across more than 12,000 confirmed breaches in the dataset. The Salesloft incident is the cleanest single case in that statistic.

The check that fits this pattern: when an AI service requests OAuth scopes against a system of record (Salesforce, Microsoft 365, Google Workspace, GitHub, Jira, Slack), audit the granted scopes against what the integration actually needs, set the shortest token lifetime the vendor supports, schedule a quarterly OAuth-grant review against the SSO directory, and revoke any grant whose human authoriser has left. The AI integration register is the deliverable: every OAuth grant, the scopes it holds, the human who authorised it, the SSO status of that human, and the last time it was reviewed.

Pattern 3: AI feature adds a multi-provider sub-processor cascade

This is the pattern where the vendor's own tool sends your data to two, three, or four different external model providers, and the providers themselves run on hyperscaler infrastructure that adds another layer underneath.

Notion AI is a clean example. The Notion AI security and privacy page lists OpenAI, Anthropic, and Google (added in January 2026 with Notion 3.2) as the three model providers in rotation. Beneath them sits Turbopuffer for vector storage, plus any external tool the user has connected via MCP. The available models in early 2026 are Claude Opus 4.5, GPT-5.2, and Gemini 3, depending on which task the user invokes. That is at least four external parties processing your workspace data before any MCP connection extends the chain further.

The retention story varies by plan and is the part most teams skip when they enable Notion AI. On Notion's Enterprise plan, the LLM providers operate with zero data retention: nothing is stored on the provider side. On every non-Enterprise plan (Free, Plus, Business), the providers may retain customer data for up to 30 days before deletion. Embeddings via OpenAI carry no provider-side retention either. Notion contractually requires every sub-processor to agree not to train on customer data, which is stronger than the default for many SaaS-AI integrations.

The cascade gets more interesting when you trace each of the three model providers down a layer. OpenAI calls run on Microsoft Azure (and increasingly OpenAI's own GPU footprint). Anthropic calls run on Amazon Bedrock and, since the 23 October 2025 expansion, Google Cloud TPUs across three chip platforms. Google's Gemini calls run on Google Cloud's own TPU and GPU fleet. A single Notion AI prompt about a customer record can, depending on which model the system selects, traverse a different US-based hyperscaler each time it is run. The CLOUD Act exposure is determined not by where Notion stores the workspace but by which model the routing layer picked for that specific call.

The legal anchor is GDPR Article 28(2). A processor cannot engage another processor for processing on the controller's behalf without prior authorisation. Where the controller has granted "general written authorisation", the processor must inform the controller of any intended changes, and the controller must have the opportunity to object. In the Notion AI case, the addition of Google as a third model provider in January 2026 was a sub-processor change under Article 28(2). The notification is the trigger. The 30-day default objection window is the clock.

The check that fits this pattern: read the vendor's AI-specific sub-processor list (not the general one), confirm your DPA covers each model provider by name or by general authorisation language that flows down to AI sub-processors, verify what plan tier you are on (because retention defaults differ by tier), and configure your sub-processor change notification to land in a monitored inbox, not a distribution list nobody reads. For Notion specifically, the Enterprise plan is the one where the zero-retention promise actually holds.

What Article 28(2) general authorisation actually requires. The "general authorisation" model in Article 28(2) is conditional, not blanket. The processor must give the controller "the opportunity to object" to changes. The EDPB has consistently read this as requiring a meaningful window, not a take-it-or-leave-it 24-hour notice. The practical defence for a small team is to document, in the vendor's renewal cycle, exactly which sub-processor changes were notified and what the response window was. If a vendor systematically gives you 7-day notices that arrive in a no-reply inbox, you do not have meaningful Article 28(2) authorisation, and the controller obligation falls back on you.

Pattern 4: AI feature stays inside the vendor's trust boundary

This is the pattern that gets the least attention because it is the safest. The vendor of a tool you already use builds an AI feature, runs it on the vendor's existing infrastructure, and never sends your data to an external model provider. The AI vendor surface is, structurally, just an extension of the vendor surface you already trusted.

Slack AI is the cleanest example in 2026. Slack runs closed-source large language models inside an "escrow VPC" on the Slack-controlled portion of AWS. The model providers themselves have no inbound access to that VPC and no path to inspect or retain prompts. Slack uses retrieval-augmented generation rather than fine-tuning: the LLM is supplied with only the content needed for a single task, and no Slack customer data is used to train any underlying model unless the customer affirmatively opts in. The architectural commitment is that data does not leave Slack's existing trust boundary, period.

This pattern is structurally important for two reasons. First, it dramatically reduces the sub-processor surface: your data is not flowing to OpenAI or Anthropic directly, only to the vendor whose DPA you already signed. Second, it means that when the vendor's underlying AWS region is the EU region, EU residency is preserved through the AI feature without any new transfer mechanism work.

But it does not eliminate the check. The pattern still creates new processing inside the vendor's infrastructure, and that processing has its own retention windows, its own logs, its own prompt/response storage, and its own potential to surface content the user technically can access but never would have found manually. The over-sharing problem is real even when the trust boundary is preserved: a Slack AI summary that surfaces every channel a user has read access to can, in a permissively-configured workspace, expose documents the user never would have opened by hand. Microsoft itself has documented this dynamic for M365 Copilot, where the tooling surfaces files the user has access to under inherited permissions even when nobody intended that access to be discoverable.

The check that fits this pattern: confirm in the vendor's engineering or trust documentation that the AI feature actually stays inside the vendor's existing infrastructure (and is not silently routed to a third-party model provider on a sub-set of requests), tighten file and channel permissions before enabling the feature, and treat the AI feature's logging and prompt-storage retention as a new surface that needs its own retention setting in your policy. The trust boundary holds. The over-sharing problem does not.

The Slack AI architecture is the one worth copying. If you are building an AI feature on top of customer data inside your own SaaS product, the escrow-VPC pattern (run a closed-source model inside infrastructure you control, supply context per request, no training, no outbound network access for the model itself) is the design that creates the smallest sub-processor surface. It is not the cheapest option, but it is the one that makes the controller-processor-sub-processor chain shortest and the DPA conversation simplest. Most of your customers' privacy reviews will end at "the model never leaves your VPC" without ever needing to evaluate Anthropic or OpenAI directly.

What changed in your data flow under Article 28(2)

Across all four patterns, the underlying legal anchor is the same. Your relationship with the SaaS vendor is a controller-processor relationship. Their relationship with the model provider is a processor-sub-processor relationship. Article 28(2) requires either specific or general authorisation for that downstream engagement, and gives you (the controller) the opportunity to object before a new sub-processor processes your data.

The four patterns differ in how visible the change is:

Pattern	What triggers the data flow change	How visible is it
1. Default flip inside an existing tool	Vendor activates a new sub-processor toggle by default	Email plus admin centre banner; easy to miss if nobody reads vendor changelogs
2. OAuth-connected AI service	An employee installs the AI app and approves the OAuth scopes	Visible in SSO logs at install time, then invisible until something breaks
3. Multi-provider cascade inside a tool	Vendor adds a new model provider to its provider rotation	Sub-processor list update; only visible if you subscribe to it
4. AI inside the vendor's trust boundary	Vendor enables a new AI feature on existing infrastructure	Most visible: usually accompanied by a press release, no new sub-processor row

Patterns 1 and 3 are the ones the EDPB Opinion 28/2024 framing on AI model anonymity matters for: when a vendor sends your workspace text or customer records to an external model, the resulting prompts and embeddings are rarely anonymous, and your controller obligations persist through the entire chain. Pattern 2 is the one that fails operationally before it fails legally; the breach happens before the audit catches the OAuth scope. Pattern 4 fails through over-sharing rather than through external transfer.

The DPA you signed with each vendor needs to do four things to cover all four patterns: identify the AI sub-processors by name or by category, specify the retention defaults by plan tier, commit to a meaningful (not 24-hour) objection window for sub-processor changes, and explicitly state that customer data is not used to train any model. Most pre-2024 SaaS DPAs do none of this. The vendor's updated DPA (almost every major vendor has published one by now) usually does, but the update may not flow back into your contract automatically. You may need to sign an amendment, and the practical answer is to bundle all four checks into the next contract renewal cycle rather than chase them individually.

The four checks that work across all four patterns

The checklist is shorter than it looks, because three of the four checks are the same shape across all four patterns. The differences are in what evidence you collect.

The data flow check. What data does the AI feature touch, where does it go, and which party processes it at each hop? The answer is structurally different for each of the four patterns. For Pattern 1 it lives in the M365 admin centre and the Microsoft DPA. For Pattern 2 it lives in the OAuth grant scopes and the SSO log. For Pattern 3 it lives in the vendor's sub-processor list and the model-routing logic. For Pattern 4 it lives in the vendor's engineering documentation. The output is the same: a one-page diagram of what flows where.
The DPA coverage check. Does the DPA you signed actually cover the AI processing, the sub-processor cascade, and the training exclusion? Read the post-2024 amendment if there is one. Read the AI-specific sub-processor list. If your vendor has not published one, that itself is the answer.
The default check. What is on by default in this tool today, and what was off by default twelve months ago? Audit the admin panel of every tool that touches customer data quarterly. The most operationally dangerous integrations are the ones where nobody made a decision.
The training-and-retention check. Does the no-training clause cover raw data, metadata, embeddings, derivative datasets, and synthetic data? Does it bind the vendor and every named sub-processor? Does it survive a unilateral terms amendment? If the vendor reserves the right to change the terms with 30 days' notice and continued use as acceptance, the no-training clause is conditional, not absolute. Your defence is the renewal cycle and a calendar reminder.

Before you call any third-party AI integration "checked", answer these four questions:

Which of the four patterns is this? If you cannot pick one, you do not understand the data flow well enough to authorise it.
Did the data flow change under Article 28(2) terms you actually agreed to? If a sub-processor was added under "general authorisation" without a notification window long enough to evaluate it, the authorisation is paper-thin.
Is anything on by default that nobody made a decision about? Default-on without an explicit choice is the operationally dangerous shape. Patterns 1 and 3 fail here most often.
Does the no-training clause flow down to every sub-processor in the chain by name, and does it survive a unilateral terms amendment? If not, it is conditional, not absolute.

Before the next admin-panel review

Open the admin centre of your most-used SaaS tool right now. Scroll to the page where new AI features and sub-processors are listed. Note the date the most recent change appeared. If it is older than your last DPA review, you are caught up. If it is newer, the four checks above are the next conversation.

The pattern you find usually tells you the next step on its own. If the change is a default flip (Pattern 1), the work is the DPA amendment and the privacy notice update. If the change is a new OAuth grant (Pattern 2), the work is the scope audit and the token-rotation policy. If the change is a new model provider in the vendor's cascade (Pattern 3), the work is the sub-processor objection window. If the change is a new vendor-internal AI feature (Pattern 4), the work is the over-sharing audit and the retention setting.

The four-pattern question is the one that turns "third-party AI integration" from a vague risk into something you can actually check.

Sources

Continue reading

AI PrivacyApr 9, 2026

When you call OpenAI, who actually processes your data? The AI sub-processor cascade

A trace-walk of one OpenAI API call through every entity in the cascade, with the Article 28, CLOUD Act, Article 48, and DMA layers stacked on top.

13 min read

AI PrivacyMar 17, 2026

AI vendor evaluation: a due diligence checklist for small teams

Generic AI vendor checklists fail because they treat every provider as one category. The right questions depend on which of four vendor archetypes you are evaluating.

13 min read

AI PrivacyFeb 25, 2026

Shadow AI: your team is using tools you don't know about

Three tiers of shadow AI in 2026: the browser tab, the in-SaaS toggle, the OAuth-scoped agent. IBM puts the breach delta at $670K, Article 4 enforcement starts 2 August 2026, and a register beats a ban.

11 min read

Free tool · live

AI Data Flow Checker

Map how personal data flows through your AI integrations and spot the privacy risks before they spot you.