AI PrivacyMarch 15, 2026· 13 min read

How to audit your codebase for AI data leakage

A practical, surface-by-surface audit recipe for finding personal data flowing to AI services. Covers prompt templates, observability defaults, embedding pipelines, and the limits of audit-by-grep in agent mode.

TLDR

On 31 March 2026, Anthropic shipped the entire 512,000-line Claude Code source tree inside @anthropic-ai/claude-code v2.1.88 because *.map was missing from .npmignore and Bun generates source maps by default. The team shipping the AI coding agent forgot to audit their own codebase. That is the whole case for doing this walk.
Sensitive Information Disclosure jumped #6→#2 on the OWASP LLM Top 10 between 2023 and 2025, and LLM08:2025 Vector and Embedding Weaknesses is a whole new entry for the RAG stack. The leakage is rarely in the API call your team reviewed; it is in the five surfaces around it: the prompt template, the observability stack, the embedding pipeline, the cache and dead-letter queue, and the agent mode that audit-by-grep cannot reach.
The biggest finding is almost always inside Sentry, Datadog, or Bugsnag. Modern observability stacks have native LLM integrations that capture prompts and responses the moment you flip send_default_pii=True for non-AI debugging.
Vector databases are personal-data stores. Treat them with the same access controls, retention windows, and processing-record entries you apply to the source data. Pinecone, Qdrant, and Weaviate have shipped the RBAC primitives since 2025.
The audit recipe is grep, then read, then run the code. Plan for an afternoon, expect to find more than you wanted, and keep a leakage-surface map for the next sprint.

The team shipping Claude Code forgot to audit their own codebase

On 31 March 2026, Anthropic published @anthropic-ai/claude-code v2.1.88 to npm. Inside the package was a 59.8 MB JavaScript source map that pointed at the entire unobfuscated TypeScript source of Claude Code — 512,000 lines across 1,906 files, including slash commands, built-in tools, 44 hidden feature flags, and an always-on background agent called KAIROS. Security researcher Chaofan Shou identified the exposure within hours. The source was mirrored to GitHub. One copy was forked 41,500+ times before Anthropic's DMCA notices started landing.

The root cause is a build-pipeline default plus one missing line. Bun generates source maps by default on publish. *.map was not in .npmignore, and files was not configured in package.json to allowlist only the distribution artifacts. Anthropic's own statement framed it as "a release packaging issue caused by human error, not a security breach." That framing is narrowly correct — no customer data or credentials shipped — and simultaneously the load-bearing point: a codebase-audit step that any grep -n "source-map" package.json would have surfaced was not in the release checklist at the team that ships the AI coding agent.

If the team shipping Claude Code can miss it, your team can miss it. That is the whole case for doing this walk.

What has actually changed since 2024

The OWASP Top 10 for LLM Applications was republished in late 2024 and "Sensitive Information Disclosure" moved from #6 to #2, the largest single jump in the list. LLM08:2025 Vector and Embedding Weaknesses is a new entry for the RAG stack specifically — it did not exist in the 2023 list and it is now the formal OWASP entry for Surface 3 below. Prompt injection stays at #1.

The reason is not that LLM security got worse. It is that the surfaces around the model multiplied. A model call in 2023 was usually one line in one file. A model call in 2026 is an SDK that pulls from a prompt template, sits inside an LLM-observability tracer, ships through an error-reporting middleware, hits a vector store, and (increasingly) lives behind an agent that can decide for itself which files to read and which tools to call.

Most teams audit the line that says client.chat.completions.create. They miss the five surfaces around it. That is what this guide is for.

The frame is straightforward. There are five places personal data leaks into an AI provider, an observability vendor, or a regulator's evidence pile. You walk them one at a time. For each surface, the recipe is: grep for the touchpoints, read the code that builds the data, run the code with a synthetic record, and write down what landed where. The whole pass takes an afternoon for a small application and one day for a larger one. You will find things. That is the point.

(For the legal frame on whether this matters, the EDPB Opinion 28/2024 published 17 December 2024 is the document to read alongside this guide. The Court of Rome's annulment of Garante Decision 755 against OpenAI on 18 March 2026 narrowed the procedural argument, not the substantive one. The "personal data flowing to an AI service" question is still very much open, and very much enforceable.)

Surface 1: The prompt itself

Start with the obvious and trace inward.

Find every connection to an AI service. The grep recipe has not changed:

# Python SDK imports
grep -rn "import openai\|from openai\|import anthropic\|from anthropic" .
grep -rn "import google.generativeai\|from vertexai\|from cohere" .

# JS / TS package usage
grep -rn "openai\|anthropic\|@google-cloud/aiplatform\|@anthropic-ai/sdk" package.json
grep -rn "new OpenAI\|new Anthropic\|ChatOpenAI\|ChatAnthropic" src/

# Direct REST calls (the ones that bypass SDKs)
grep -rn "api.openai.com\|api.anthropic.com\|generativelanguage.googleapis.com" .

# Wrapper frameworks
grep -rn "from langchain\|from llama_index\|from semantic_kernel\|from haystack" .

That is the audit surface. Now the actual leak.

For each touchpoint, find the function that builds the request and read it. The interesting code is almost never the line that calls the API. It is the template ten functions away that drops the customer's name into the system prompt for context. Like this:

# This API call looks fine
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_message},
    ],
)

# What's in system_prompt?
system_prompt = f"""You are a support agent for {company_name}.
The customer's name is {customer.name}.
Their account ID is {customer.id}.
Their last 10 orders: {customer.order_history}
Their account balance: {customer.balance}
"""
# Personal data, financial data, account history. In every single request.

The questions to answer for each touchpoint:

Does the prompt contain user-supplied input that has not been sanitised? (The unsanitised-user-input question is not only about prompt injection. The OpenAI Codex GitHub token vulnerability disclosed by BeyondTrust on 16 December 2025 and patched 5 February 2026 worked because a user-supplied GitHub branch name was interpolated into a task-creation HTTP request as a shell argument. The injection stole the victim's GitHub User Access Token across the ChatGPT website, Codex CLI, Codex SDK, and the Codex IDE Extension. If your prompt template splices user-supplied strings into downstream tool calls, shell commands, or SQL, Surface 1 is where that lives.)
Does the system prompt contain identifiers, names, balances, or anything pulled from your database?
Are file contents, attachments, or extracted document text sent verbatim?
Is the full conversation history sent on every request, or only the last N turns?
For RAG calls, what is in the retrieved chunks? (We come back to this in surface 3.)

The single most common pattern is the system prompt that quietly grew. It started as "You are a helpful assistant" in week one and absorbed customer fields one PR at a time. Nobody decided to send personal data to an LLM. It happened by drift.

Surface 2: Logs, APM, and error reporting

This is the surface most teams underestimate, and it is almost always the biggest source of leakage in the audit.

The fix to remember before the search: stop logging full LLM payloads in production. Log metadata (model, token count, latency, user ID hash, status code). If you genuinely need full prompts for debugging, route them to a separate, short-retention, tightly-access-controlled log stream that lives behind a feature flag.

Now the search:

# Direct logging of prompts and responses
grep -rn "log.*prompt\|log.*completion\|log.*response\|log.*messages" src/
grep -rn "logger.*openai\|logger.*anthropic\|logger.*llm\|logger.*gpt" src/
grep -rn "console.log.*ai\|console.log.*chat\|console.log.*gpt" src/

# Error handling around AI calls
grep -rn "catch.*openai\|except.*openai\|catch.*anthropic\|except.*anthropic" src/
grep -rn "sentry.*capture\|bugsnag.*notify\|rollbar.*error" src/

Direct logging is the easy half. The harder half is what the observability stack does for you without asking.

Modern observability SDKs now capture LLM prompts and responses the moment you turn the integration on. Sentry's Python OpenAI integration does not include prompts and outputs by default. The moment you set send_default_pii=True in sentry_sdk.init(), the integration starts shipping LLM inputs and outputs to Sentry. To opt back out without losing the rest of the PII context, you have to add OpenAIIntegration(include_prompts=False) explicitly. Datadog's LLM Observability traces inputs and outputs of every instrumented LLM call by default, with Sensitive Data Scanner as the redaction layer that has to be enabled separately. The default for both products is "capture more" once you enable AI features. Audit your sentry_sdk.init() call and your Datadog LLM Observability configuration before anything else.

The Sentry default in particular catches teams by surprise. The send_default_pii=True flag is often set deliberately for non-AI debugging (it lets you see request headers and IPs in error events). The moment that flag is on and the OpenAI integration is loaded, every failed LLM request ships its prompt to Sentry's servers. Sentry then becomes a processor of the personal data in those prompts, with whatever retention window your Sentry plan has. Most teams have not added Sentry to their processor register.

The same logic applies up the stack. Bugsnag's on_error hooks, Datadog APM's request body capture (off by default but commonly on), New Relic's transaction tracing, and any retry middleware that logs the failing request body are all surfaces where prompts can land. The fix is the same shape every time: scrub before send, scrub server-side, or do not capture.

A quick reality check on retention. If your observability tool keeps logs for 90 days, the personal data that leaked into those logs is in that vendor's storage for 90 days. That is also the window in which a regulator's Article 15 access request can pull the data back out. A leak you patched in week one is still discoverable in week ten.

Surface 3: The embedding pipeline and the vector store

If the application does Retrieval-Augmented Generation, there is an embedding pipeline. The pipeline is a major data flow that does not look like one because nobody calls it "an AI API call" in conversation.

What to check:

What gets embedded? Whole customer records, support tickets with names attached, contracts with party names, internal docs with employee references? That is what is now in your vector index.
Are personal identifiers stripped before the text reaches the embedding model? Most pipelines do not strip them. The default is "embed the source text as-is."
Where does the vector index live? Pinecone, Qdrant, Weaviate, pgvector, an in-process FAISS, an Azure AI Search index? Each is a separate processor relationship and a separate processing record entry.
Who can query the index? Are there per-tenant namespaces or is everyone hitting one shared collection?
What is the retention policy? Vector indices accumulate. Most teams never delete.

The legal question is whether vector embeddings derived from personal data are themselves personal data under GDPR. The honest answer in April 2026 is "almost always yes, for any embedding model in widespread use." Reconstruction attacks like vec2text and ALGEN have shown that recovery from embedding to text is increasingly tractable, especially for short inputs. If you want the long version, the vector embeddings article in this set walks through it. The short version: treat embeddings as personal data unless you can articulate why a specific embedding model and a specific input length push them out of scope. Most cannot.

The 2025 RBAC primitives are good enough to fix this without a rewrite. Pinecone supports project-scoped API keys, per-index RBAC, and namespace isolation that most teams treat as a tagging convention but is actually a hard tenancy boundary. Qdrant ships first-class multitenancy with named tenants, sharding controls, and lifecycle APIs that let you delete a tenant's vectors on Article 17 erasure requests without rebuilding the index. Weaviate Enterprise Cloud added HIPAA on AWS in 2025 and SOC 2 Type II on the managed offering. The audit fix is one engineering sprint: switch your shared collection to per-tenant namespaces, scope the API keys to the projects that need them, and enable the audit log on the vector database itself.

The other thing the vector store needs is a delete path. Right of erasure requests under Article 17 GDPR cannot stop at "we deleted the row in Postgres." If a copy of that row's text was embedded and indexed, the vector also has to go. Build the delete path before the first request comes in, not after. The right-to-erasure article in this set covers the unlearning vs hard-delete distinction for the harder cases.

Surface 4: Cache, retries, and the dead-letter queue

The forgotten surfaces. Three places personal data lands that nobody puts on the audit list:

Caches. If you cache LLM responses for performance (Redis, Memcached, a CDN, an application-layer cache), every cached response contains whatever personal data was in that response. Cache TTL becomes a personal-data retention window. Cache eviction becomes a deletion control. Access to the cache becomes access to personal data. If you use a managed cache like Upstash Redis or Cloudflare Workers KV, that vendor is now a processor.

Retry queues and dead-letter queues. When a model call fails and your retry middleware captures the request body for later replay, the request body contains the prompt. If retries land in SQS, RabbitMQ, or a Kafka dead-letter topic, the prompt is now in that queue's retention window. The worst case here is a dead-letter queue with a 14-day retention that nobody checks, slowly filling with prompts containing customer data, six months past the original incident.

Idempotency stores. If you use idempotency keys for LLM calls (some vendors and frameworks do this for cost-deduplication), the idempotency record may contain a hash of the prompt. Some implementations store the full prompt for debugging. Find out which kind yours is.

The grep for these is harder because the surface is heterogeneous. Look for cache.set, redis.set, sqs.send_message, kafka.produce, and any wrapper your team has written around them, then walk back the data being stored.

Surface 5: Agent mode and the limits of audit-by-grep

This is the surface that breaks the recipe.

Audit-by-grep works when you can stand in front of the API call and read the data going in. It does not work when the LLM itself decides which files to open, which functions to call, and which tools to invoke. In an agent setup (Cursor's agent mode, GitHub Copilot's coding agent, Claude Code, Cline, and the broader MCP-server ecosystem), the data flow is decided at runtime by the model. There is no static call site to audit. There is a tool list, a system prompt, and a model that picks a sequence of tools at inference time.

What audit-by-grep cannot find in an agent setup. A grep for `client.chat.completions.create` will tell you that an agent has access to the OpenAI API. It will not tell you which files the agent decided to open in this morning's session, which database queries it ran, or which MCP tools it invoked against which target. That information lives in the agent's session log, which most teams never centralised. Microsoft documents that the GitHub Copilot org-level audit log captures that a session happened, not what was in the prompt. The same is true of the Cursor and Claude Code session logs unless you have explicitly piped them somewhere. If your team is using agent mode with personal data in scope, the audit moves from "grep the codebase" to "centralise and review the agent session logs," and that is a different operation that needs its own runbook.

The honest read: I do not think audit-by-grep is going to be a complete answer for any team that adopts agent mode at depth. The MCP server audit data we have so far is grim. Published audits of public MCP servers found OAuth flow flaws in 43%, command injection in 43%, unrestricted network access in 33%, and file access outside the intended scope in 22% of implementations. Those are not edge cases. They are the modal MCP server. The OWASP MCP Top 10 v0.1 Beta codifies the same risks formally: MCP01 (Token Mismanagement), MCP05 (Command Injection), MCP07 (Insufficient Authentication), MCP08 (Lack of Audit and Telemetry), MCP09 (Shadow MCP Servers). The Endor numbers are the empirical reality behind the framework.

The mitigating moves are threefold. First, treat the agent's tool list as a sub-processor cascade and document each tool's data access. Second, centralise agent session logs into a reviewable sink (SIEM, S3 with retention, anything you can run a search against) — MCP08 is the OWASP Top 10 entry for teams that skip this step. Third, write an agent register that tracks which agents run with which tool access against which data, alongside the DPIA. The shadow AI article in this set covers the register pattern. The MCP security article covers the OWASP MCP Top 10 and what to actually configure.

Guardrails for the next sprint

Finding leaks is one thing. Preventing the next set is harder. The guardrails that work in practice:

One AI client module. Wrap every LLM call in a single internal module. All prompts go through it. That gives you one place to scrub PII, one place to log metadata not content, one place to enforce model selection, one place to add a feature flag for verbose logging in dev. New AI integrations that bypass the module trip a CI check.

class AIClient:
    def complete(self, messages, *, metadata):
        sanitized = self._strip_pii(messages) if self.strip_pii else messages
        self._log_metadata(metadata)  # token count, model, latency, user hash
        response = self._client.chat.completions.create(
            model=self.model,
            messages=sanitized,
        )
        return response

A code-review checklist line. Add one question to the PR template: "Does this PR send personal data to an AI service or change what is sent? If yes, is the prompt template documented?" Five seconds per PR and it surfaces the drift problem at the right moment.

A CI check for new AI imports. A grep run in CI that flags new AI SDK imports outside the centralised client module. Not a hard fail, just a "human, please look at this" review hook.

A leakage-surface map kept in the repo. A docs/ai-data-flows.md file that lists, for each AI surface, what data flows through it, what processor it touches, what scrubbing applies, and the date of the last audit. Update on every elevation pass. This is the artefact regulators ask for and the artefact your future self will thank you for.

Synthetic test data in development. Staging and dev environments should never use real personal data with real API keys. Use a realistic-but-fake dataset, separate API keys per environment, and a different set of observability scrub rules for staging. The data residency article covers the cross-environment angle in more depth.

After your next sprint

Block out half a day this sprint for surfaces 1 and 2 (the prompt template and the observability stack). Those are where the largest leaks live and the audit is mostly grep, read, and configuration changes you can make immediately. Surfaces 3 through 5 (embedding pipeline, cache layer, agent mode) need a follow-up with the people who own those systems, so put them on the next sprint board with a rough scope.

Then do the boring step that most teams skip: write down what you found and what you fixed. A page in the engineering handbook is fine. The reason is that this audit is not a one-time exercise. Models change, SDKs add features, observability vendors flip defaults, and a quarterly re-run will catch the next batch.

The visibility test. For each AI integration in your codebase, you should be able to answer in five minutes: which personal data is in the prompt, which observability vendor sees a copy, which processor is the embedding pipeline, what the cache and DLQ retention windows are, and which agent (if any) can decide to fetch more. If any of those takes longer than five minutes to answer, the leakage-surface map is not yet where it needs to be. That is the work for the next sprint.

Sources

Continue reading

GDPR + AIApr 6, 2026

Are vector embeddings personal data under GDPR? A technical answer for RAG builders

Vector embeddings of personal data are likely personal data under GDPR. Here is the legal test, the 2025 attack research, the regulator convergence, and how to document your position.

8 min read

AI PrivacyFeb 23, 2026

Your employee pasted client data into ChatGPT. Now what

A time-anchored runbook for handling the most common AI incident in 2026: a team member pasted personal data into a consumer-tier ChatGPT account. First hour through the policy that stops the next one.