AI PrivacyMarch 17, 2026· 13 min read

AI vendor evaluation: a due diligence checklist for small teams

Generic AI vendor checklists fail because they treat every provider as one category. The right questions depend on which of four vendor archetypes you are evaluating.

TLDR

Generic AI vendor checklists fail in 2026 because they treat every provider as one category. The questions that matter depend on the vendor archetype: hyperscaler API, API-only LLM startup, SaaS-with-embedded-AI, or open-source-with-hosting.
The hyperscaler API question is the consumer-versus-business product split. The API-only startup question is sub-processor maturity. The SaaS-embedded-AI question is whether the existing contract covers the AI uplift. The open-source-hosting question is who owns the runtime.
Three vendor changes between November 2025 and January 2026 (Mixpanel breach, Anthropic-in-Copilot, OpenAI in-region GPU) would not have been caught by any generic checklist. They are caught by reading the sub-processor list of a specific archetype, with a specific question in mind.
For a small team without procurement, the budget is 60 to 90 minutes per vendor. The checklist below is built around that budget, organised by archetype.
The AI Act layer arrives 2 August 2026: Article 26 deployer obligations apply regardless of vendor archetype, and the vendor-evaluation register is the artefact a regulator will ask for first.

The reason most AI vendor checklists fail in 2026 is that they treat every AI provider as one category. They are not. An OpenAI evaluation is not an Otter AI evaluation. A Hugging Face Inference Endpoint evaluation is not a Mistral evaluation. The questions that surface real risk depend on what kind of vendor you are looking at, and the generic checklist asks the right question for one archetype and the wrong question for the other three.

Three changes between November 2025 and January 2026 make the point. Mixpanel got breached on 8 November 2025 and OpenAI had to remove it from production within twenty days, forcing a sub-processor change for every API customer. Microsoft activated Anthropic models inside Microsoft 365 Copilot on 7 January 2026, defaulting them on for most commercial cloud and off for EU/EFTA/UK. OpenAI added in-region GPU inference for Enterprise, Edu and Healthcare on 16 January 2026, switching ZDR on for any API customer who had enabled EU residency. None of those changes would have been caught by a generic vendor due-diligence checklist. They would have been caught by reading the right sub-processor list with the right question in mind, and the right question is different for each archetype.

This piece is the archetype-by-archetype version. Four vendor archetypes, the questions that matter for each, and a 60-to-90-minute walk per evaluation. Built for a small team without procurement.

The four AI vendor archetypes

Archetype	Examples	Load-bearing question
Hyperscaler API	OpenAI API, Anthropic API, Google Vertex, Azure OpenAI, AWS Bedrock	Which product line is your team actually using, and does the DPA reach it?
API-only LLM startup	Mistral, Cohere, Together AI, Fireworks, Groq, DeepInfra	How mature is the sub-processor list, and how stable is the company?
SaaS with embedded AI	Notion AI, Slack AI, Zoom AI Companion, Otter, Fireflies, Microsoft 365 Copilot, GitHub Copilot	Does your existing contract with the SaaS cover the AI uplift, and what changed when the AI feature shipped?
Open-source with hosting	Hugging Face Inference Endpoints, Replicate, BentoCloud, Modal, Anyscale, RunPod	Who owns the runtime, what is the model lineage, and where does the GPU live?

The four are not perfectly disjoint. Azure OpenAI is a hyperscaler API for the inference, but it is also a product where Microsoft is the contracting party and the existing Azure Enterprise Agreement carries the contract surface. Hugging Face hosts open-source models and also acts as an inference vendor. The right move is to pick the dominant archetype for the evaluation and answer that archetype's questions first, then check whether the secondary archetype adds anything.

Archetype 1: hyperscaler API

This is the easy archetype. OpenAI, Anthropic, Google, Microsoft, and AWS all publish DPAs, sub-processor lists, certifications, and trust pages. The information is available. The risk is not finding the answer; the risk is asking the wrong question.

The load-bearing question for a hyperscaler API is the product-line trap. Each provider sells multiple product lines under the same brand name, and the DPA covers some lines and not others. The OpenAI API DPA does not cover ChatGPT Plus on a personal account. The Anthropic API contract is not the same document as the consumer Claude.ai contract. Vertex AI's enterprise terms are different from the Gemini consumer app. (See the OpenAI DPA walk for the worked product-line example.)

The 30-minute walk for a hyperscaler API:

Which product is the team actually using. Look at the SDK init in the codebase. Is it openai.OpenAI() (the API) or is someone logged into chat.openai.com (the consumer product)? The two have different DPAs and different default training settings.
Is the API DPA executed. Open the provider's organisation settings page. Confirm a countersigned DPA is on file. (For OpenAI: Settings → Compliance → Data Processing Addendum.) A surprising fraction of teams discover during a regulator request that this was never countersigned.
What is the sub-processor list, and when was it last updated. Open the public sub-processor list page. Read it. For OpenAI the page is here; for Anthropic it is at trust.anthropic.com; for Google Vertex it is in the Google Cloud trust page. The Mixpanel incident is the proof that this list moves and that the movement is operational, not theoretical.
Where is data processed. For an EU-facing deployment, check whether data residency is enabled. OpenAI launched EU residency on 5 February 2025 and added in-region GPU inference on 16 January 2026; Azure OpenAI has Data Zones EU; AWS Bedrock has EU regions; Vertex has EU endpoints with the global-endpoint residency caveat. Only Vertex has the residency-bypass-on-global-endpoint trap.
Is training opt-out the default. For business products on the API, training is off by default for OpenAI (since 2023), Anthropic (since the August 2025 commercial-tier update), and Google Vertex. For consumer products it is on by default. Confirm the team is not on a consumer tier.

If the answers to these five take more than thirty minutes, the deployment has a hidden product-line gap. The most common gap is shadow ChatGPT Plus accounts inside a company that signed the API DPA. (See the shadow AI ladder for the operational fix.)

Tip. The hyperscaler trust page answers 80% of due-diligence questions in 10 minutes. For OpenAI: trust.openai.com. For Anthropic: trust.anthropic.com. For Google Cloud: cloud.google.com/security/compliance. For Microsoft: servicetrust.microsoft.com. For AWS: aws.amazon.com/compliance. Each has the SOC 2 report, ISO 27001 certification, sub-processor list, and DPA in one place. If you cannot find any of those four artefacts on the trust page within ten minutes, the gap is on you, not the vendor. (See [the Anthropic vs OpenAI vs Google comparison](/articles/anthropic-vs-openai-vs-google-privacy-comparison) for the side-by-side.)

Archetype 2: API-only LLM startup

Mistral. Cohere. Together AI. Fireworks. Groq. DeepInfra. Anyscale. The API-only LLM startups are the second-most-common vendor type a small team will pick, usually for cost or for a specific model not available on a hyperscaler.

The information here is patchier. DPAs exist but may not be linked from the main site. Sub-processor lists exist but may not be public. SOC 2 reports exist but may be in the early stages of certification. The trust pages are often a single page rather than a full portal. And the company itself is smaller, which means the operational risks (acquisition, pivot, shutdown) are higher.

The load-bearing question for this archetype is sub-processor maturity. A two-year-old AI startup probably uses a hyperscaler for inference, a CDN for ingress, an analytics vendor, an error-tracking vendor, and possibly a third party for content moderation. Each one is a sub-processor. The list may not exist publicly, and the act of asking for it is itself a useful signal about the vendor's maturity.

The 60-minute walk for an API-only LLM startup:

Find the DPA. Search "[vendor] DPA" or "[vendor] data processing agreement". If it is not on the website, email legal@ or compliance@. If they cannot send you one within five business days, that is a stop.
Ask for the sub-processor list explicitly. If it is not public, ask for it as part of the DPA email. The reply tells you what you need to know about the vendor's maturity. A reply with a clean PDF list within 24 hours is a yes; a reply that asks why you want it is a hard no.
Search for incidents. A two-minute web search for "[vendor] data breach" or "[vendor] security incident". For young vendors the absence of incidents is also informative: it usually means the vendor has not been at scale long enough to have one. Adjust your read.
Check the funding stage and the runway. This is the part that feels uncomfortable for a privacy review but matters more than most teams think. A vendor running out of cash in 18 months is a vendor whose terms are about to change, whose sub-processors are about to consolidate, and whose support will get worse. Crunchbase is sufficient. Series A or earlier is a different risk profile from Series C.
Check OpenAI-API compatibility. Many API-only LLM startups offer an OpenAI-compatible API as a feature. This is a migration insurance policy. If the vendor turns out to be wrong for the deployment in six months, OpenAI-compatible means the switching cost is one environment variable. Bake this into the evaluation.
Check the EU residency story. Most API-only startups were US-first and have only recently added EU options. As of April 2026, Mistral has native EU residency; Cohere has EU regions on AWS; Together AI does not. The story moves fast. Verify it on the vendor's pricing page rather than relying on a third-party comparison.

The honest framing for this archetype: the questions are the same as for a hyperscaler, but the answers are harder to find and the answers themselves are softer. A small team should expect to spend more time per vendor and should weight operational stability heavily.

Archetype 3: SaaS with embedded AI

This is the archetype that catches the most teams off guard, because the AI feature is added on top of an existing contract, and the existing contract is rarely re-read when the AI lands.

Notion AI was added to Notion. Slack AI was added to Slack. Zoom AI Companion was added to Zoom. Otter and Fireflies bolted onto the meeting workflow. Microsoft 365 Copilot landed inside an existing Microsoft Enterprise Agreement. GitHub Copilot was added to an existing GitHub contract. In each case, a team that signed the original SaaS contract two or three years ago now has an AI processing layer that did not exist when the contract was signed.

The load-bearing question for SaaS-embedded-AI is the contract-coverage gap. Three things to check:

Is the AI uplift covered by the existing DPA, or by a separate addendum? Most large SaaS vendors handled this by issuing a new addendum or a contract amendment. Microsoft added an "AI Services" section to its DPA. Slack added an AI Use Notice. If there is no addendum and the existing DPA is the only document, the AI processing may be running on contract terms that did not contemplate it.

What is the data flow inside the SaaS for the AI feature. Notion AI sends document content to OpenAI as a sub-processor. Slack AI sends message content to a different model layer. Zoom AI Companion sends transcript text to Anthropic and OpenAI under Zoom's no-training pledge. Microsoft 365 Copilot sends prompts to a chain that now includes Anthropic models since 7 January 2026 (for most commercial cloud, with EU/EFTA/UK off by default). The data flow is rarely visible from the user-facing UI; it lives in the trust page or the admin centre.

What is opt-out granularity. Some SaaS-AI features can be turned off at the workspace level. Others can be turned off only at the user level. Some require an admin centre toggle that defaults on (or off, depending on geography and tenancy). The Microsoft 365 Copilot Anthropic activation is the case in point: an admin had to find the toggle in the admin centre between 8 December 2025 (when it became visible) and 7 January 2026 (when it activated), and the default state depended on geography. (See the contract-cascade walkthrough for the longer Copilot read.)

Warning. The SaaS-AI uplift trap. The SaaS contract you signed two years ago does not automatically cover the AI feature the vendor shipped this quarter. The Otter California class action (filed 2025) and the Fireflies Illinois lawsuit (filed December 2025) both centre on the question of whether existing meeting-recording consent covers the AI summary that the vendor now generates. The AI uplift is a new processing purpose under Article 5(1)(b), and the controller-side obligation is to re-inform the user under Article 13(3). Your existing privacy notice probably does not cover it.

The 60-minute walk for a SaaS-with-embedded-AI vendor:

Find the new addendum. Search the vendor's legal page for "AI" and the year (2025 / 2026). If there is no AI-specific addendum, the existing DPA is doing the work alone, and you need to read it with that question in mind.
Read the trust-page entry for the AI feature. Most SaaS vendors publish a one-page summary of the AI feature's data flow. Notion AI's is at notion.com/help; Slack's is at slack.com/trust; Zoom's is at zoom.us/trust. The page tells you the model providers, the data flow, the retention, and the training posture.
Open the admin centre and find the AI toggle. Verify the default state for your tenancy. For a multi-region tenancy the default may differ across regions. The Microsoft 365 Copilot case is the calibration example: default ON for most commercial cloud, default OFF for EU/EFTA/UK, and the admin had a 30-day window to choose.
Check whether your privacy notice covers the AI uplift. This is the controller-side step that most teams forget. If you added Otter to your meetings and your privacy notice still says only "we record meetings for note-taking", you have a transparency gap.
Check whether the AI feature has been classified in the AI Act. From 2 August 2026, Article 26 deployer obligations apply. If the SaaS-AI feature is on the high-risk list (Annex III), Article 27 FRIA carry-over with the DPIA applies and you need both documents.

Archetype 4: open-source with hosting

The fourth archetype is open-source models hosted on third-party infrastructure. Hugging Face Inference Endpoints. Replicate. BentoCloud. Modal. RunPod. Anyscale. These vendors host open-source models (Llama, Mistral, Qwen, DeepSeek, Phi) on rented GPUs and expose an inference endpoint.

The load-bearing question here is who owns the runtime. The model is open-source, which means the lineage is verifiable (you can read the model card). The hosting layer is not. When you call a Llama 4 endpoint on Replicate, you are sending prompts to Replicate's infrastructure, which runs on a hyperscaler, which runs on a specific GPU in a specific data centre. Each of those layers is a sub-processor. None of them is the model itself.

The 60-minute walk for an open-source hosting vendor:

Verify the model lineage. Check the model card on Hugging Face or the vendor's docs. Confirm the model name, version, and any fine-tuning. For models hosted by third parties, the model card is the only source of truth about what the model has been trained on.
Find the hosting infrastructure. The DPA or the trust page should specify whether the inference runs on AWS, GCP, Azure, or a private GPU cloud. For most hosting vendors, the answer is "a hyperscaler" with a region selector.
Check the GPU residency. For an EU-facing deployment, the GPU has to be in the EU. Some hosting vendors offer EU-only inference; others mix regions and route to the cheapest available GPU. The latter is unacceptable for personal data.
Read the model licence. Llama 4 has a 700 million MAU clause. Llama 3 has a different clause. Mistral models have different licences depending on whether you use the open-weights version or the commercial API. The licence is part of the contract surface.
Check whether the vendor logs prompts for abuse monitoring. This is the equivalent of the OpenAI 30-day default question. Some hosting vendors retain prompts for safety; others do not. The DPA or the docs should specify.
Check the SafeTensors / pickle question. For models loaded via the older pickle format, the ReversingLabs NullifAI work (February 2025, Picklescan bypass) is the calibration case. Prefer SafeTensors-loaded models unless there is a specific reason to use pickle. (See the open-source models piece for the longer security read.)

The honest framing for this archetype: the privacy contract is shallower than for a hyperscaler API, but the visibility into the model is deeper. Tradeoff, not a hierarchy.

Note. The vendor-evaluation register is the artefact a regulator will ask for first. EU AI Act Article 26 deployer obligations come into force on 2 August 2026. For deployers of high-risk systems, the regulator's first question is rarely "show me your DPIA". It is "show me the register of AI systems you have evaluated and decided to use, with the date of the decision and the name of the person who made it". The register sounds bureaucratic until the day a regulator asks for it. For a small team without procurement, the register is a single shared spreadsheet with one row per vendor: vendor name, archetype, date evaluated, decision, conditions, next-review date. (See [the AI in HR piece](/articles/ai-in-hr-and-recruitment) for the worked Annex III register.)

The 90-minute walk, regardless of archetype

If you have only ninety minutes total to evaluate a vendor and cannot tell which archetype it is, run this generic walk and then re-read the relevant archetype section:

Find the DPA. Read the sub-processor section, the retention section, and the breach notification section. Twenty minutes.
Find the trust page. Read the SOC 2 entry and the certification list. Ten minutes.
Find the public sub-processor list. Identify the cloud infrastructure provider. Five minutes.
Search "[vendor] data breach" and "[vendor] security incident". Ten minutes.
Find the EU residency option, if any. Five minutes.
Open the admin or organisation settings page and verify the privacy-relevant defaults (training opt-out, retention setting, residency). Ten minutes.
Map the data flow between your application and the vendor: what goes in the prompt, what comes back, where each side stores it. Twenty minutes.
Write the one-row register entry. Ten minutes.

That is the 90 minutes. The archetype-specific walk above slots into step 7.

The archetype-first decision rule. Before you ask any due-diligence question, ask which of the four archetypes the vendor is. The right next questions are different.

Hyperscaler API: which product line is the team using, does the DPA reach it.
API-only LLM startup: how public is the sub-processor list, how stable is the company.
SaaS with embedded AI: does the existing contract cover the AI uplift, what changed when the AI feature shipped.
Open-source with hosting: who owns the runtime, where is the GPU, what is the model lineage.

The generic 8-step walk is a backstop, not a substitute. The archetype-specific question is the one that catches the risk a checklist would have missed.

Before the next vendor pick

Three things to do this week.

First, list every AI vendor your team is currently using and label each with one of the four archetypes. The labels are not perfectly disjoint, and that is fine. Pick the dominant archetype. The exercise alone surfaces vendors you forgot you had.

Second, for the vendor you most recently adopted without a formal evaluation, run the archetype-specific walk and write the one-row register entry. Backfill what should have been the pre-launch process.

Third, calendar a quarterly read of the sub-processor list for every active vendor. The Mixpanel incident, the Microsoft Anthropic activation, and the OpenAI in-region GPU rollout all happened in a sixty-day window. The next sixty days will produce something similar. The objection window only protects you if someone is watching it.

Vendor evaluation is not a one-off checklist. It is a quarterly habit, and it scales with the number of AI vendors in your stack rather than with the size of your team. The archetype frame is the way to make the habit small enough to actually run.

Sources

Continue reading

AI PrivacyMar 5, 2026

Anthropic vs OpenAI vs Google: privacy policy comparison

What changed for the three providers in 2025-2026: Anthropic's August 2025 consumer shift, the October 2025 Google TPU sub-processor expansion, the Court of Rome OpenAI annulment, and the Latombe DPF appeal pending at the CJEU.

12 min read

AI PrivacyMar 3, 2026

OpenAI's data processing agreement: what it actually says

A clause-by-clause read of OpenAI's DPA in April 2026: what changed in the last 12 months, what still trips deployers, and the operational decisions that follow each clause.