Generic AI vendor checklists fail because they treat every provider as one category. The right questions depend on which of four vendor archetypes you are evaluating.
The reason most AI vendor checklists fail in 2026 is that they treat every AI provider as one category. They are not. An OpenAI evaluation is not an Otter AI evaluation. A Hugging Face Inference Endpoint evaluation is not a Mistral evaluation. The questions that surface real risk depend on what kind of vendor you are looking at, and the generic checklist asks the right question for one archetype and the wrong question for the other three.
Three changes between November 2025 and January 2026 make the point. Mixpanel got breached on 8 November 2025 and OpenAI had to remove it from production within twenty days, forcing a sub-processor change for every API customer. Microsoft activated Anthropic models inside Microsoft 365 Copilot on 7 January 2026, defaulting them on for most commercial cloud and off for EU/EFTA/UK. OpenAI added in-region GPU inference for Enterprise, Edu and Healthcare on 16 January 2026, switching ZDR on for any API customer who had enabled EU residency. None of those changes would have been caught by a generic vendor due-diligence checklist. They would have been caught by reading the right sub-processor list with the right question in mind, and the right question is different for each archetype.
This piece is the archetype-by-archetype version. Four vendor archetypes, the questions that matter for each, and a 60-to-90-minute walk per evaluation. Built for a small team without procurement.
| Archetype | Examples | Load-bearing question |
|---|---|---|
| Hyperscaler API | OpenAI API, Anthropic API, Google Vertex, Azure OpenAI, AWS Bedrock | Which product line is your team actually using, and does the DPA reach it? |
| API-only LLM startup | Mistral, Cohere, Together AI, Fireworks, Groq, DeepInfra | How mature is the sub-processor list, and how stable is the company? |
| SaaS with embedded AI | Notion AI, Slack AI, Zoom AI Companion, Otter, Fireflies, Microsoft 365 Copilot, GitHub Copilot | Does your existing contract with the SaaS cover the AI uplift, and what changed when the AI feature shipped? |
| Open-source with hosting | Hugging Face Inference Endpoints, Replicate, BentoCloud, Modal, Anyscale, RunPod | Who owns the runtime, what is the model lineage, and where does the GPU live? |
The four are not perfectly disjoint. Azure OpenAI is a hyperscaler API for the inference, but it is also a product where Microsoft is the contracting party and the existing Azure Enterprise Agreement carries the contract surface. Hugging Face hosts open-source models and also acts as an inference vendor. The right move is to pick the dominant archetype for the evaluation and answer that archetype's questions first, then check whether the secondary archetype adds anything.
This is the easy archetype. OpenAI, Anthropic, Google, Microsoft, and AWS all publish DPAs, sub-processor lists, certifications, and trust pages. The information is available. The risk is not finding the answer; the risk is asking the wrong question.
The load-bearing question for a hyperscaler API is the product-line trap. Each provider sells multiple product lines under the same brand name, and the DPA covers some lines and not others. The OpenAI API DPA does not cover ChatGPT Plus on a personal account. The Anthropic API contract is not the same document as the consumer Claude.ai contract. Vertex AI's enterprise terms are different from the Gemini consumer app. (See the OpenAI DPA walk for the worked product-line example.)
The 30-minute walk for a hyperscaler API:
openai.OpenAI() (the API) or is someone logged into chat.openai.com (the consumer product)? The two have different DPAs and different default training settings.If the answers to these five take more than thirty minutes, the deployment has a hidden product-line gap. The most common gap is shadow ChatGPT Plus accounts inside a company that signed the API DPA. (See the shadow AI ladder for the operational fix.)
Mistral. Cohere. Together AI. Fireworks. Groq. DeepInfra. Anyscale. The API-only LLM startups are the second-most-common vendor type a small team will pick, usually for cost or for a specific model not available on a hyperscaler.
The information here is patchier. DPAs exist but may not be linked from the main site. Sub-processor lists exist but may not be public. SOC 2 reports exist but may be in the early stages of certification. The trust pages are often a single page rather than a full portal. And the company itself is smaller, which means the operational risks (acquisition, pivot, shutdown) are higher.
The load-bearing question for this archetype is sub-processor maturity. A two-year-old AI startup probably uses a hyperscaler for inference, a CDN for ingress, an analytics vendor, an error-tracking vendor, and possibly a third party for content moderation. Each one is a sub-processor. The list may not exist publicly, and the act of asking for it is itself a useful signal about the vendor's maturity.
The 60-minute walk for an API-only LLM startup:
The honest framing for this archetype: the questions are the same as for a hyperscaler, but the answers are harder to find and the answers themselves are softer. A small team should expect to spend more time per vendor and should weight operational stability heavily.
This is the archetype that catches the most teams off guard, because the AI feature is added on top of an existing contract, and the existing contract is rarely re-read when the AI lands.
Notion AI was added to Notion. Slack AI was added to Slack. Zoom AI Companion was added to Zoom. Otter and Fireflies bolted onto the meeting workflow. Microsoft 365 Copilot landed inside an existing Microsoft Enterprise Agreement. GitHub Copilot was added to an existing GitHub contract. In each case, a team that signed the original SaaS contract two or three years ago now has an AI processing layer that did not exist when the contract was signed.
The load-bearing question for SaaS-embedded-AI is the contract-coverage gap. Three things to check:
Is the AI uplift covered by the existing DPA, or by a separate addendum? Most large SaaS vendors handled this by issuing a new addendum or a contract amendment. Microsoft added an "AI Services" section to its DPA. Slack added an AI Use Notice. If there is no addendum and the existing DPA is the only document, the AI processing may be running on contract terms that did not contemplate it.
What is the data flow inside the SaaS for the AI feature. Notion AI sends document content to OpenAI as a sub-processor. Slack AI sends message content to a different model layer. Zoom AI Companion sends transcript text to Anthropic and OpenAI under Zoom's no-training pledge. Microsoft 365 Copilot sends prompts to a chain that now includes Anthropic models since 7 January 2026 (for most commercial cloud, with EU/EFTA/UK off by default). The data flow is rarely visible from the user-facing UI; it lives in the trust page or the admin centre.
What is opt-out granularity. Some SaaS-AI features can be turned off at the workspace level. Others can be turned off only at the user level. Some require an admin centre toggle that defaults on (or off, depending on geography and tenancy). The Microsoft 365 Copilot Anthropic activation is the case in point: an admin had to find the toggle in the admin centre between 8 December 2025 (when it became visible) and 7 January 2026 (when it activated), and the default state depended on geography. (See the contract-cascade walkthrough for the longer Copilot read.)
The 60-minute walk for a SaaS-with-embedded-AI vendor:
The fourth archetype is open-source models hosted on third-party infrastructure. Hugging Face Inference Endpoints. Replicate. BentoCloud. Modal. RunPod. Anyscale. These vendors host open-source models (Llama, Mistral, Qwen, DeepSeek, Phi) on rented GPUs and expose an inference endpoint.
The load-bearing question here is who owns the runtime. The model is open-source, which means the lineage is verifiable (you can read the model card). The hosting layer is not. When you call a Llama 4 endpoint on Replicate, you are sending prompts to Replicate's infrastructure, which runs on a hyperscaler, which runs on a specific GPU in a specific data centre. Each of those layers is a sub-processor. None of them is the model itself.
The 60-minute walk for an open-source hosting vendor:
The honest framing for this archetype: the privacy contract is shallower than for a hyperscaler API, but the visibility into the model is deeper. Tradeoff, not a hierarchy.
If you have only ninety minutes total to evaluate a vendor and cannot tell which archetype it is, run this generic walk and then re-read the relevant archetype section:
That is the 90 minutes. The archetype-specific walk above slots into step 7.
Three things to do this week.
First, list every AI vendor your team is currently using and label each with one of the four archetypes. The labels are not perfectly disjoint, and that is fine. Pick the dominant archetype. The exercise alone surfaces vendors you forgot you had.
Second, for the vendor you most recently adopted without a formal evaluation, run the archetype-specific walk and write the one-row register entry. Backfill what should have been the pre-launch process.
Third, calendar a quarterly read of the sub-processor list for every active vendor. The Mixpanel incident, the Microsoft Anthropic activation, and the OpenAI in-region GPU rollout all happened in a sixty-day window. The next sixty days will produce something similar. The objection window only protects you if someone is watching it.
Vendor evaluation is not a one-off checklist. It is a quarterly habit, and it scales with the number of AI vendors in your stack rather than with the size of your team. The archetype frame is the way to make the habit small enough to actually run.
What changed for the three providers in 2025-2026: Anthropic's August 2025 consumer shift, the October 2025 Google TPU sub-processor expansion, the Court of Rome OpenAI annulment, and the Latombe DPF appeal pending at the CJEU.
A clause-by-clause read of OpenAI's DPA in April 2026: what changed in the last 12 months, what still trips deployers, and the operational decisions that follow each clause.
Three real 2025-2026 vendor term changes (Anthropic's August 2025 consumer pivot, OpenAI's Mixpanel sub-processor removal, and Microsoft's January 2026 Anthropic addition) and the four-step playbook for when the notification email arrives.
Free tool · live
AI Data Flow Checker
Map how personal data flows through your AI integrations and spot the privacy risks before they spot you.