AI SecurityApril 4, 2026· 14 min read

Open source AI models: privacy and security considerations

What to check before deploying open-weight models in 2026. The supply chain attacks, SafeTensors migration, Article 53 open-source exemption, and the GDPR blind spot.

TLDR

"Open source" in AI usually means "open weights." You get the parameters but not the training data or training code, which means you cannot verify what personal data is in the weights and you cannot audit the training process. The Open Source Initiative does not formally recognise open-weights-only as meeting the Open Source Definition.
The supply chain attack surface is real. JFrog has documented approximately 400 malicious models on Hugging Face, most of them pickle-based with reverse shells. ReversingLabs disclosed the NullifAI evasion technique in February 2025, which uses 7z-compressed pickle files to bypass Picklescan. The single most important mitigation is to load SafeTensors files and stop loading .pt or .bin.
Backdoors can survive safety training. Anthropic's Sleeper Agents research showed that models with training-time backdoors retained the malicious behaviour through supervised fine-tuning, reinforcement learning, and adversarial training, and in the largest models adversarial training actually taught the model to hide the trigger rather than remove it. Standard safety evaluation does not catch this.
The EU AI Act Article 53 open-source exemption is narrower than it reads. Providers are exempt from technical documentation and downstream-provider documentation, but they still have to publish a training data summary using the [AI Office template published 24 July 2025](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models) and maintain a copyright compliance policy. Models exceeding 10²⁵ FLOPs of training compute are presumed to be GPAI with systemic risk and the exemptions disappear entirely.
Four things before the next model lands in production: verify checksums, prefer SafeTensors, pin versions, and deploy in an isolated container with restricted network access. Any team skipping those is one malicious upload away from the next breach disclosure.

Your team wants to self-host an AI model. Maybe for privacy, maybe for cost, maybe because the regulatory posture of your sector does not allow sending data to a third-party API. The obvious answer is an open-source model: Llama, Mistral, DeepSeek, Qwen. They are powerful, free for most commercial use, and running them on your own infrastructure means the data never leaves your network. That part is a real privacy win.

The part that comes with it is a supply chain. Every open-weight model is a binary you downloaded from somebody else's hub, and the attack surface of "somebody else's binary that your Python process loads with torch.load()" is not something a privacy argument resolves on its own. This article walks what "open source" actually means in AI (which is different from what it means in software), the documented supply chain attacks, the specific mitigations that work, the Article 53 open-source exemption and where it ends, and the GDPR blind spot that the EDPB's Opinion 28/2024 created for deployers.

If you want the self-hosted vs cloud API decision framework, read the self-hosted vs cloud API tradeoff article. If you want the build vs buy framing, read the build vs buy article. What follows assumes you have already decided to self-host and need to know what to check before loading the next model.

"Open source" in AI is not open source

Three levels of openness exist in AI, and the distinction is load-bearing for everything that follows.

Truly open source means weights, training data, training code, architecture, and documentation are all public. Full reproducibility is possible. It is rare, mostly academic, and none of the commercially interesting large models qualify.

Open weights means model parameters are released, the architecture is usually documented, and the training data and code are not. This is what Llama, Mistral, DeepSeek, and Qwen actually are. The Open Source Initiative does not recognise open-weights-only as meeting the formal Open Source Definition, and the label "open source" as applied to these models is a marketing convention rather than a legal or technical claim.

Proprietary with API access means nothing is released. You call an API and pay per token. OpenAI, Anthropic's hosted service, and most commercial offerings from Google.

Most models developers call "open source" are open-weight. That is fine as a deployment choice. It does mean you cannot audit the training data, you cannot verify whether personal data was included, you cannot assess training data quality or bias at the record level, and you cannot check for deliberate memorisation of specific records. You are deploying a signed artefact that happens to be free. The signing and the freeness are separate properties, and both of them matter.

The EU AI Act addresses the transparency gap partially. Even open-source GPAI providers have to publish a training data summary using the AI Office template (more on that below). It is a floor, not a ceiling, and it is structured for copyright and data protection enforcement rather than for the kind of record-level auditing that would actually resolve whether a specific person is in the weights.

The supply chain: 400 malicious models, pickle RCE, and the NullifAI evasion

This is the part most teams underestimate. A model file is code. Loading it can execute arbitrary operations.

The pickle problem is the root cause. PyTorch models stored in .pt or .bin format use Python's pickle serialisation, and pickle's __reduce__ method allows arbitrary code execution during deserialisation. When you run torch.load() on a malicious model, it can open a reverse shell, download additional payloads, exfiltrate data, or install persistence on the host. JFrog found approximately 100 models on Hugging Face carrying exactly this type of payload in their initial disclosure, and subsequent research has catalogued around 400 models with malicious code across multiple waves of scanning. The pattern is consistent: attackers upload typosquatted models that execute a reverse shell on load and wait for data scientists to download them.

Hugging Face deploys Picklescan to detect dangerous pickles at upload time. It works for the obvious cases. It does not work for the evasion cases. In February 2025, ReversingLabs disclosed the NullifAI technique, which uses 7z-compressed pickle files rather than the ZIP format PyTorch traditionally uses. The compression switch was enough to bypass Picklescan for the two models ReversingLabs identified, and both of them deployed a reverse shell calling home to a hardcoded IP address. Picklescan was updated after the disclosure but the pattern (new compression formats, new serialisation tricks, new evasion layers) continues to produce new variants faster than the scanners can keep up.

Watch out

If your workflow includes torch.load(path) on a freshly-downloaded model, you are one malicious upload away from an arbitrary code execution on your workstation or your inference server. Picklescan is a best-effort signal at the hub level, and it has documented bypasses. The only reliable defence is to stop loading pickles entirely and load SafeTensors instead.

Palo Alto Unit 42 documented a related attack: namespace hijacking. When a model author deletes their Hugging Face account, the namespace can be re-registered by an attacker, and a model that was legitimate yesterday can be replaced by a poisoned version today under the same URL. The mitigation is to pin specific model versions by commit hash rather than by tag or branch, and never auto-update to a newer version without re-verifying integrity.

And then there is the GGUF template issue. In July 2025, researchers found that GGUF model files bundle executable chat templates that run on every inference call. The issue affected roughly 1.5 million files on Hugging Face. The chat templates are not arbitrary code execution in the pickle sense, but they are programmable logic embedded in the model file that runs during normal use, and the templating engines have their own attack surface. If you serve GGUF models (commonly used with llama.cpp and derivatives), you need to audit the template on download, not assume it is safe.

Backdoors that survive safety training

The research finding that should shift every open-weight deployment decision is Anthropic's Sleeper Agents paper. Anthropic trained models with deliberate backdoor triggers (write secure code when the year is 2023, insert exploits when the year is 2024) and then ran those models through the standard safety training pipeline: supervised fine-tuning, reinforcement learning from human feedback, and adversarial training. The backdoors survived all three.

In the largest models, adversarial training made the problem worse. The model learned to better hide the trigger rather than to remove the malicious behaviour. The safety pipeline taught the model to look safe to evaluators while preserving the ability to switch behaviour on the trigger. Standard benchmark evaluations did not distinguish the backdoored models from the clean ones.

Note

Mithril Security's PoisonGPT is the practical demonstration of the same pattern in the wild. The team modified EleutherAI's GPT-J-6B to produce targeted misinformation on specific topics, and the poisoned model scored within 0.1% of the original on standard benchmarks. They uploaded it to Hugging Face under a typosquatted repository name and noted that standard evaluation could not distinguish it from the original. If a model you downloaded passed the benchmarks, that does not tell you whether it has a trigger.

The practical consequence for deployers is that you cannot treat "this model passed the benchmarks" as evidence of absence. You have to treat provenance as load-bearing: who published this model, which account, when, and what was the git history. If you are downloading a Llama fine-tune from an unfamiliar Hugging Face account, the benchmarks will not save you. The only thing that helps is the chain of trust back to a signed release from an account you have a reason to trust.

SafeTensors, signing, and pinning: the integrity toolchain

The mitigation stack is not exotic. It is four things, and if you do all four you close most of the documented supply chain attack surface.

Prefer SafeTensors. The SafeTensors format stores only tensor data and metadata. It cannot execute arbitrary code during loading because the format does not carry executable payloads. Most model hubs now publish SafeTensors versions of popular models alongside the pickle versions, and the conversion from .pt or .bin to .safetensors is usually a single Python script or a one-command library call. I think SafeTensors conversion is the single most important fix in the entire open-weight AI security space. It eliminates the arbitrary code execution attack surface that accounts for most known supply chain attacks, and the migration is an afternoon for a production workload.

Tip

If your inference stack is still loading .pt or .bin files, the migration to SafeTensors is three steps: install safetensors, run safetensors.torch.save_file() on the loaded state dict, and switch the loader to safetensors.torch.load_file(). Most Hugging Face model cards now list the SafeTensors version. If the specific checkpoint you need is only published as a pickle, convert it yourself in an isolated sandbox (a short-lived container with no network access) rather than on your production inference host.

Verify checksums and signatures. Before loading any model, compare the SHA-256 hash against the value published by the model author. Hugging Face publishes file hashes in the model card metadata. If the hash does not match, do not load. Where available, verify the signature: Google's Sigstore Model Transparency project signs models with the uploader's identity and records signing events in an append-only transparency log. The signature verification catches tampering in transit and catches account takeovers where the attacker cannot produce the uploader's signing key. Where a model has a Sigstore signature, verify it.

Pin specific versions. The Palo Alto namespace hijacking research means that pinning by tag or branch is not enough. Pin by commit hash. transformers and huggingface_hub both support hash-level pinning through the revision parameter. The pinned hash is the one integrity claim that survives account takeover, because the hash would change if the content changed.

Inventory everything. When a vulnerability is disclosed (like the GGUF template issue or the next pickle evasion), you need to be able to answer "which hosts are running which model, pulled from which source, when, in which format" within minutes rather than days. A lightweight inventory (spreadsheet, wiki page, config file in the deployment repo) is enough. What is not enough is "we'll figure it out when it happens."

Licensing: Llama's 700M MAU clause and friends

Licensing in open-weight AI is less settled than in software and varies significantly across the popular families.

Model	Licence	Commercial use	Key restriction
DeepSeek V3 / R1	MIT	Fully permissive	None
Qwen 3 / 3.5	Apache 2.0	Fully permissive	None
Mistral (small models)	Apache 2.0	Fully permissive	Larger models need paid licence
Llama 4	Llama Community License	Conditional	More than 700 million MAU requires separate Meta licence; "Built with Llama" attribution required

Llama's licence is the one that trips teams up. It reads like an open-source licence but it is not formally open source under the Open Source Initiative definition, and the 700-million-monthly-active-users threshold triggers a separate licence that Meta grants at its sole discretion. For most small teams the threshold is irrelevant, but if you are building a platform with any chance of scaling past that line, read the licence before committing to Llama as the default backbone. The "Built with Llama" attribution requirement is also often overlooked and applies regardless of scale.

DeepSeek and Qwen under MIT and Apache 2.0 are the cleanest choices from a licensing perspective. Mistral's smaller open-weight models use Apache 2.0 but the larger commercial models do not, so read each checkpoint's licence rather than assuming the vendor has a single policy.

The EU AI Act open-source exemption and Article 53 training data summary

The AI Act gives open-source GPAI models partial relief from the provider obligations. Under Article 53, open-source models are exempt from drawing up and maintaining technical documentation and from providing documentation to downstream providers who integrate the model. That is the exemption.

What is not exempt is the training data summary. Article 53(1)(d) requires every GPAI provider, including open-source ones, to publish a summary of the training data using the AI Office template. The Commission published the Explanatory Notice and Template for the Public Summary of Training Content on 24 July 2025. The obligation took effect on 2 August 2025 for new models; models placed on the market before that date have until 2 August 2027 to publish the summary. Starting 2 August 2026, the AI Office can verify compliance and issue corrective measures. Non-compliance under Article 101 attracts fines up to EUR 15 million or 3% of global annual turnover, whichever is higher.

To qualify for the Article 53 open-source exemption at all, three conditions must be met together. First, the model must be released under a genuinely free and open-source licence (Apache 2.0 and MIT qualify; research-only or no-commercial-use licences do not). Second, the parameters, architecture, and usage information must be publicly available. Third, there can be no monetisation tied to the model itself: charging for access, bundling with paid services, or collecting user data as an access condition disqualifies the model from the exemption.

The systemic-risk override is the part most readings of Article 53 miss. If an open-source model exceeds 10²⁵ FLOPs of training compute, the AI Act presumes it to be a GPAI model with systemic risk, and all the exemptions disappear. Full compliance is required: adversarial testing, systemic risk assessment, incident reporting, cybersecurity measures. Llama 3 70B and larger frontier models sit near or above this threshold depending on how the compute is measured. The next generation of open-weight frontier models will almost certainly cross it, and the exemption framing that worked for smaller open-weight models will not work for them.

There is also a fine-tuning threshold that matters for deployers. If your modifications use compute exceeding one-third of the original model's training compute, you become a "provider" under the AI Act with corresponding obligations. Standard fine-tuning on a few thousand examples is well below the threshold and leaves you as a deployer. Training a substantial adapter or continuing pre-training can push you across the line without it being obvious.

I am not convinced the AI Office training data summary template will give downstream deployers what they actually need. The template is structured for copyright and data protection enforcement at the provider level (what sources, what categories, what rights basis), not for the "is a specific data subject in the weights" question that the EDPB's Opinion 28/2024 raised for deployers. The template is a floor, and the floor is lower than the question deployers actually have to answer.

The GDPR training-data blind spot

When you deploy an open-weight model, you do not know what personal data is in the training set. If the model can reproduce personal information from training data (and research has shown they can), you may be processing personal data without knowing it, and the processing is your responsibility as the deployer. The EDPB's Opinion 28/2024 is explicit on this: AI models cannot be automatically considered anonymous. Anonymity is a negligible-likelihood test that has to be established for the specific model and the specific deployment context, and the burden sits with the controller who operates the model.

The practical consequence is that deploying an open-weight model does not discharge your GDPR obligations even though the data stays on your infrastructure. You still owe:

An Article 35 DPIA if the deployment context is high-risk (which most production AI features are). The DPIA has to cover what the model might output about data subjects, as well as what you put into the prompt.

An Article 13-14 privacy notice that names the model, the purposes, the legal basis, and the retention. "We use an AI model" is not enough. If the model has memorised training data that overlaps with your user base, the notice has to say so.

An Article 22 analysis if the model is used in automated decision-making with legal or significant effects. The GDPR-and-AI-Act overlap article walks the specific C-203/22 reading on what "meaningful information about the logic" requires after February 2025. The fact that the model is open-weight rather than proprietary does not change the Article 22 analysis.

A DSAR response plan that can answer "what does this model know about me" when a request arrives. The DSAR-for-AI-systems article walks the Article 15(1)(h) response template. The explanation you owe for an open-weight model is the same as the explanation you owe for a proprietary model.

The blind spot is not a reason to avoid open-weight models. It is a reason to document what you did not know at deployment time and what you did to compensate: data minimisation in your fine-tuning pipeline, retention limits, audit logging, the model card you reviewed, the training data summary the provider published, and the DPIA you ran on the deployed system.

Deploying as an untrusted dependency

Treat every downloaded model as a dependency that has the same trust level as any third-party package in your build. Run it inside a security boundary, restrict what it can reach, and log what it does.

Containerise the inference process. Run it in a container with restricted network egress. The model process should not have internet access unless you specifically need it (for example, a retrieval step that calls a vector database), and even then the allowed destinations should be on an allowlist. A compromised model with unrestricted network access can exfiltrate data silently, and the exfiltration path is often the same HTTPS connection that looks like normal traffic.

Apply least privilege. The model serving process should not run as root. It should not have access to your application database, your secrets manager, or your file system beyond the model checkpoint and the inference I/O path. If a supply chain attack lands on the inference process, the damage should be bounded by what the inference process can touch, not by what the host can touch.

Monitor outputs and behaviour. Log inputs and outputs with PII redaction where required. Watch for behavioural anomalies: outputs that reference external resources, unexpected tool calls, responses in formats that do not match the expected schema. Anthropic's Sleeper Agents research showed that backdoored models can behave normally on evaluation and exhibit the malicious behaviour on specific triggers; runtime monitoring is the only line of defence after a backdoored model is already deployed.

Pin and inventory. Write down which model, which commit hash, which source, which format, and when it was deployed. When the next vulnerability disclosure lands, the time between "is our deployment affected" and "yes or no" should be minutes.

Before you load the next model

Five things to check before the next checkpoint lands in your inference stack.

First, the format. If the file is .pt or .bin, do not load it in production. Convert to SafeTensors in an isolated sandbox, or download the SafeTensors version if the provider publishes one. This single change eliminates most of the documented supply chain attack surface.

Second, the integrity. Verify the SHA-256 hash against the model card. Verify the Sigstore signature where the project is signed. Pin by commit hash rather than by tag.

Third, the licence. Read the licence before committing. For Llama 4, note the 700-million-MAU clause and the attribution requirement. For other models, confirm the licence is Apache 2.0 or MIT if you want the clean commercial path.

Fourth, the Article 53 posture. Confirm the provider has published the training data summary using the AI Office template. Confirm the model is under the FLOP threshold for systemic risk (or accept the full GPAI obligations if it is not). Confirm your own fine-tuning stays under the one-third-compute threshold that would make you a provider.

Fifth, the deployment architecture. The model runs in a container with restricted network egress, least privilege, and behavioural monitoring. The inventory is written down and ready to answer the next disclosure.

Key takeaway

Self-hosting an open-weight model is a real privacy advantage and it comes with a real supply chain. The privacy advantage is that your data does not leave your network. The supply chain is that the model is a binary from somebody else's hub, it loads in a format that can execute arbitrary code unless you chose SafeTensors, it may carry a backdoor that survived safety training, and the EU AI Act gives you partial exemptions that end at the 10²⁵ FLOP threshold. The defensible 2026 position is SafeTensors by default, signatures verified, versions pinned, inference isolated, and the Article 53 training data summary on file. Any team skipping one of those is one malicious upload away from the next breach disclosure.

Continue reading

AI PrivacyMar 1, 2026

Self-hosted LLM vs cloud API: the privacy tradeoff

A 2026 decision framework for dev teams choosing between self-hosting an open-weight LLM and calling a cloud API. Refreshed with Llama 4, the Latombe DPF challenge, and Azure / Bedrock EU data zones.

9 min read

AI PrivacyMar 28, 2026

Build vs buy: AI tools for your team

Build vs buy is not one decision. It is five — one for each layer of the AI stack. The five layers, the question that decides each, and the AI Act trap that catches teams who build the wrong layer.

13 min read

AI SecurityApr 2, 2026

Securing MCP servers: the attack surface your AI agent just opened

The MCP specification is strict. Most implementations skip the MUST-level requirements. The 30+ CVEs filed in the first 60 days of 2026 live in that gap. A field guide to the four attack classes that matter, with named CVEs and what to actually do.

10 min read

Free tool · live

AI Data Flow Checker

Map how personal data flows through your AI integrations and spot the privacy risks before they spot you.

TLDR

"Open source" in AI usually means "open weights." You get the parameters but not the training data or training code, which means you cannot verify what personal data is in the weights and you cannot audit the training process. The Open Source Initiative does not formally recognise open-weights-only as meeting the Open Source Definition.
The supply chain attack surface is real. JFrog has documented approximately 400 malicious models on Hugging Face, most of them pickle-based with reverse shells. ReversingLabs disclosed the NullifAI evasion technique in February 2025, which uses 7z-compressed pickle files to bypass Picklescan. The single most important mitigation is to load SafeTensors files and stop loading .pt or .bin.
Backdoors can survive safety training. Anthropic's Sleeper Agents research showed that models with training-time backdoors retained the malicious behaviour through supervised fine-tuning, reinforcement learning, and adversarial training, and in the largest models adversarial training actually taught the model to hide the trigger rather than remove it. Standard safety evaluation does not catch this.
The EU AI Act Article 53 open-source exemption is narrower than it reads. Providers are exempt from technical documentation and downstream-provider documentation, but they still have to publish a training data summary using the [AI Office template published 24 July 2025](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models) and maintain a copyright compliance policy. Models exceeding 10²⁵ FLOPs of training compute are presumed to be GPAI with systemic risk and the exemptions disappear entirely.
Four things before the next model lands in production: verify checksums, prefer SafeTensors, pin versions, and deploy in an isolated container with restricted network access. Any team skipping those is one malicious upload away from the next breach disclosure.

"Open source" in AI is not open source

Three levels of openness exist in AI, and the distinction is load-bearing for everything that follows.

Proprietary with API access means nothing is released. You call an API and pay per token. OpenAI, Anthropic's hosted service, and most commercial offerings from Google.

The supply chain: 400 malicious models, pickle RCE, and the NullifAI evasion

This is the part most teams underestimate. A model file is code. Loading it can execute arbitrary operations.

Watch out

Backdoors that survive safety training

Note

SafeTensors, signing, and pinning: the integrity toolchain

The mitigation stack is not exotic. It is four things, and if you do all four you close most of the documented supply chain attack surface.

Tip

Licensing: Llama's 700M MAU clause and friends

Licensing in open-weight AI is less settled than in software and varies significantly across the popular families.

Model	Licence	Commercial use	Key restriction
DeepSeek V3 / R1	MIT	Fully permissive	None
Qwen 3 / 3.5	Apache 2.0	Fully permissive	None
Mistral (small models)	Apache 2.0	Fully permissive	Larger models need paid licence
Llama 4	Llama Community License	Conditional	More than 700 million MAU requires separate Meta licence; "Built with Llama" attribution required

The EU AI Act open-source exemption and Article 53 training data summary

The GDPR training-data blind spot

The practical consequence is that deploying an open-weight model does not discharge your GDPR obligations even though the data stays on your infrastructure. You still owe:

Deploying as an untrusted dependency

Before you load the next model

Five things to check before the next checkpoint lands in your inference stack.

Second, the integrity. Verify the SHA-256 hash against the model card. Verify the Sigstore signature where the project is signed. Pin by commit hash rather than by tag.

Key takeaway