AI-generated content labeling: what Article 50 requires and how to implement it

TLDR

Article 50 of the EU AI Act applies on 2 August 2026. Providers of generative AI systems must mark outputs in a machine-readable format (Article 50(2)). Deployers must label deepfakes and AI-generated text on matters of public interest (Article 50(4)).
C2PA is the working standard for images, audio, and video. Every major provider (OpenAI, Google, Meta, Adobe, Midjourney) already ships C2PA content credentials, and the Python and Rust libraries are production-ready. If you ship images and you do not sign them with C2PA, this is the fix that is cheapest and most defensible.
Text is the genuinely hard case. Google's SynthID-Text is production in the sense that it ships in Gemini and the token-level watermarking scheme is open-sourced in the Responsible GenAI Toolkit, but 2025-2026 research has shown it is vulnerable to paraphrase, copy-paste modification, and back-translation attacks. I am not convinced any current text watermarking method meets the Article 50(2) standard of "effective, interoperable, robust and reliable" under adversarial conditions.
The Code of Practice on marking and labelling AI-generated content is on its second draft (5 March 2026) and finalisation is expected by early June 2026. Feedback on the second draft closed 30 March. The second draft is more streamlined than the first and reduces the compliance burden, but the core requirement of multiple redundant techniques stays.
Fines for Article 50 breaches run up to EUR 15 million or 3% of global turnover (the higher), with the lower threshold for SMEs. Start C2PA for images now. Draft your text position before June. Do not plan around an Omnibus delay.

Article 50 of the EU AI Act applies on 2 August 2026. The rule is straightforward on paper: if your system generates synthetic images, audio, video, or text, mark the outputs in a machine-readable format, and where the outputs are deepfakes or AI-generated text published on matters of public interest, disclose it to the user. The rule is harder in practice because the three technical standards implementing it are at very different maturity levels. C2PA for images, audio, and video is shipping production. Text watermarking is a moving target. The Code of Practice that spells out what "machine-readable" means is still on its second draft as of April 2026, and the finalisation is expected in early June, giving providers roughly two months of runway.

This article walks the obligation structure, the three technical layers that implement it, the specific gap around text, and the operational things you can do now before the Code of Practice locks. If you ship an AI feature that generates any content, the work fits on a runbook and the runbook is cheaper than the fine.

Where the obligations land: Article 50(1), 50(2), 50(4)

Article 50 is the transparency backbone of the AI Act. It splits responsibilities across three sub-articles, and the split matters because a single system can trigger any combination of them.

Article 50(1) applies to providers of systems that interact directly with people. If your chatbot talks to a user, you must tell the user they are interacting with AI. The exception is where the AI nature is obvious from context. A voice menu saying "press 1 for billing" does not need a disclosure. An LLM-backed support agent that is deliberately designed to feel human-like does. The obligation sits with the provider of the system, not with the deployer who embeds it.

Article 50(2) applies to providers of generative AI systems. If your system generates synthetic images, audio, video, or text, you must mark the outputs in a machine-readable format as artificially generated or manipulated. The statutory language is that solutions must be "effective, interoperable, robust and reliable." The Code of Practice is the document that specifies what that phrase means in practice.

Article 50(4) applies to deployers. If you generate or publish deepfakes (content that resembles real people, places, or events), you must disclose that the content is artificially generated. If you publish AI-generated text on matters of public interest, you must also disclose it. The editorial review exception is narrow: if a human with editorial responsibility reviewed the content, disclosure is not required, but the review has to be genuine editorial judgement rather than a rubber stamp. An editor who approves 400 AI-drafted pieces in a day is not meaningfully reviewing any of them.

The artistic, creative, and satirical exemption is limited and often misread. Article 50(4) allows lighter transparency for content that is "evidently" artistic or satirical, but lighter does not mean none. You still have to disclose the AI origin, just in a way that does not interfere with the work's enjoyment. Satire is not a carve-out for skipping disclosure entirely.

If you are a SaaS platform that both ships a generative AI feature and publishes AI-generated text to your users, you are a provider and a deployer at the same time, and you owe the Article 50(2) and Article 50(4) stack simultaneously.

What C2PA actually gives you (and what it doesn't)

For images, audio, and video, the Coalition for Content Provenance and Authenticity (C2PA) is the working standard and has been shipping in production across the major providers since 2024. OpenAI embeds C2PA content credentials in DALL-E and ChatGPT image outputs. Google embeds them in Gemini outputs. Meta ships them across Facebook, Instagram, and Threads. Adobe Firefly writes them by default. Midjourney adopted them. If your pipeline consumes images from any of these providers and re-serves them, the content credentials are likely already in the file unless your storage layer has stripped them.

C2PA works by embedding a cryptographically signed manifest inside the content file. The manifest records who created the content, with which system, when, whether AI was involved, and what edits happened. The signature is verifiable against a certificate chain, and any tampering invalidates the signature. The digitalSourceType field in the manifest's c2pa.actions assertion is the specific field that flags trained-algorithmic-media origin.

Here is the minimal Python signing flow:

import json
from c2pa import Builder, Context, Signer, C2paSignerInfo, C2paSigningAlg

manifest = json.dumps({
    "claim_generator": "your_app/1.0",
    "assertions": [{
        "label": "c2pa.actions",
        "data": {
            "actions": [{
                "action": "c2pa.created",
                "digitalSourceType":
                    "http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia"
            }]
        }
    }]
})

with open("cert.pem", "rb") as cert, open("key.pem", "rb") as key:
    signer_info = C2paSignerInfo(
        alg=C2paSigningAlg.PS256,
        sign_cert=cert.read(),
        private_key=key.read(),
        ta_url=b"http://timestamp.digicert.com"
    )
    with Context() as ctx:
        with Signer.from_info(signer_info) as signer:
            with Builder(manifest, ctx) as builder:
                builder.sign_file("input.jpg", "output.jpg", signer)

Verification is a few lines:

from c2pa import Reader, Context

with Context() as ctx:
    with Reader("output.jpg", context=ctx) as reader:
        print(reader.json())

The c2pa-python library is MIT-licensed and production-stable. For Rust use c2pa-rs. A c2patool command-line binary exists for the one-off case. C2PA supports JPEG, PNG, TIFF, MP4, WAV, and PDF formats.

Tip

In production, do not store the signing certificate and private key as files on your inference workers. Use a key management service (AWS KMS, Azure Key Vault, Google Cloud KMS) and sign through the KMS. The C2PA libraries accept callback-based signing, so the private key never leaves the KMS boundary. The cost of getting this right on day one is one hour. The cost of rotating a leaked signing certificate is the full trust relationship with every piece of content you have already shipped.

C2PA has two real limitations that matter for Article 50. First, it is provenance, not authentication. C2PA records what the creator says happened; it does not verify, for example, that the underlying image was actually generated by the model the manifest claims. Second, the manifest is metadata, and metadata can be stripped. If a platform re-encodes the image or strips EXIF, the C2PA manifest goes with it unless the platform explicitly preserves content credentials.

Both limitations are why the Code of Practice requires multiple redundant techniques rather than C2PA alone. The second layer is imperceptible watermarking built into the pixel (or audio sample) data itself, and the fallback layer is fingerprinting or logging where neither metadata nor watermark survives.

The text watermarking problem in 2026

For text, the situation is different and less settled. The current article covered text as "no dominant standard exists," which is no longer accurate. Google's SynthID-Text is the first production-deployed text watermarking system. It ships in Gemini and the implementation is open-sourced in Google's Responsible GenAI Toolkit. The underlying scheme is token-level: SynthID modulates the distribution over next-token candidates at sampling time in a way that biases selection toward a pseudorandom "tournament winner" set without visibly degrading output quality. A watermark detector with access to the same seed can score an arbitrary piece of text for whether it was likely generated by a SynthID-enabled model.

The Nature paper ("Scalable watermarking for identifying large language model outputs", October 2024) established the core method and showed it works at production scale with minimal latency overhead and no impact on model training. In that sense text watermarking exists and is production-ready.

In another sense it is not. Follow-up research in 2025 and early 2026 has shown that SynthID-Text is vulnerable to meaning-preserving attacks. Paraphrasing the output, copy-pasting through an intermediate system that re-samples tokens, and back-translation through a second language all significantly degrade the watermark's detectability. The arXiv theoretical analysis and empirical validation paper (2603.03410) documents this, and a robustness assessment paper (2508.20228) quantifies the degradation.

Watch out

Article 50(2) requires marking to be "effective, interoperable, robust and reliable." A watermark that a paraphrase through a second LLM removes is not robust against the adversarial case the regulation is partly designed to catch. If your use case involves publishing AI-generated text that might be copied, quoted, or paraphrased before it reaches the reader (which is most web content), SynthID-Text is a useful signal but not a compliance tool on its own. You need a second layer: API-level provenance logging tied to the specific text your system generated, and a visible Article 50(4) disclosure where the text is published on a matter of public interest.

I am not convinced any current text watermarking method will satisfy the Article 50(2) robustness standard for adversarial modification before the Code of Practice is revised. The honest 2026 position is that SynthID-Text is the best available first layer, it is the one to deploy if you have the infrastructure, and it needs to be paired with provenance logging and a visible disclosure wherever the publication is on a matter of public interest. The Code of Practice second draft does not yet bless a specific text watermarking method, and the provision nearest to it talks about "multiple techniques" for text without naming one.

Note

If you are running a first-party LLM (your own fine-tune or self-hosted model), the SynthID-Text scheme is open-sourced and can be integrated into your sampling loop with moderate effort. If you are calling OpenAI, Anthropic, or Google APIs, the watermark is the provider's responsibility, and their Article 50(2) position is the one you inherit by using their API. Ask the provider in writing for their watermarking plan and document the answer in your Article 50 compliance file.

What the Code of Practice second draft specifies

The Code of Practice on the marking and labelling of AI-generated content is the document that spells out what the Article 50 statutory language means in practice. The Commission published the second draft on 5 March 2026, and feedback closed 30 March. Finalisation is expected in early June 2026, ahead of the 2 August 2026 application date. That gives providers roughly two months between the final text and the deadline.

The second draft is organised in two sections. Section 1 targets providers of generative AI systems and covers marking and detection. Section 2 targets deployers and covers labelling of deepfakes and text publications on matters of public interest. Compared to the first draft, the second draft is more streamlined, gives providers more flexibility on which specific techniques to deploy, and reduces the documentation burden. The core requirement of multiple redundant marking techniques stays: at minimum, providers should implement digitally signed metadata (C2PA), imperceptible watermarking, and fingerprinting or logging as a fallback.

The specific "layers" language matters because it tells you what redundancy the Commission expects. If your only marking is C2PA metadata, a platform that strips metadata defeats your entire compliance posture in one hop. The second draft expects providers to plan for at least one metadata-resistant signal (the imperceptible watermark) and one fallback when even the watermark fails (fingerprinting or operator-side logs keyed to the specific output). The three-layer architecture is defence in depth.

The draft also proposes a common EU icon for visible AI-content labelling with modality-specific display rules: a fixed clearly-visible icon for images, a persistent non-intrusive icon plus opening disclaimer for real-time video, an opening disclaimer plus end credits for non-real-time video, an audible disclaimer for audio, and a fixed marking at first exposure for text on matters of public interest. The exact icon is still being finalised, but the display rules are stable enough that you can wire the UX for them now.

Visible labelling, deepfakes, and the editorial review exception

Visible labelling is primarily the deployer's job under Article 50(4). The deployer is the entity that puts the AI content in front of the end user, and the disclosure goes there.

For deepfakes, the disclosure has to be clear and prominent. The deployer must have an internal process for identifying and classifying deepfake content. Automated detection alone is not enough in the Code of Practice's framing; the draft recommends both automated detection and human oversight for classification, especially where the consequence of misclassification is significant.

For text on matters of public interest, the editorial review exception is the clause most teams will try to rely on. The statute exempts content that a human with editorial responsibility has reviewed. The exception is genuine but narrow. The human reviewer has to actually be able to change the content, has to have the time to reach an independent conclusion, and has to carry editorial responsibility in the organisational sense. A moderator who scrolls through 1,000 AI-drafted posts per day and clicks "approve" is not meaningfully reviewing anything, and a regulator reading the editorial review exception would not accept that as human responsibility. If you run a news aggregator or a content platform with AI-assisted drafting, the Article 50(4) disclosure is the default and the editorial exception is the exception.

The "matter of public interest" phrase is broader than it sounds. The Code of Practice framing extends it to any content intended to inform the public, including political reporting, health information, consumer advice, financial guidance, and analysis of current events. If your platform publishes content about any of those topics and you use AI to draft it, assume Article 50(4) applies unless you have a specific and documented editorial review process.

When the metadata gets stripped

Metadata removal is the main attack against provenance-based marking, and it happens routinely: image hosts re-encode uploads to optimise size, messaging apps strip EXIF to reduce file size and protect user privacy, social platforms re-compress media. The Code of Practice treats this as expected, not adversarial, which is why the second draft insists on multiple techniques rather than a single metadata layer.

Three things you can do to improve your posture against stripping.

First, embed the imperceptible watermark at the pixel or sample level when you generate the content. For images this is a second layer beyond the C2PA manifest. For audio and video, the watermark rides in signal modulations that survive format conversion. The Python libraries for pixel-level watermarking are less mature than C2PA, but the major providers have their own implementations.

Second, log provenance operator-side. If you generated the content, you know you generated it. Keep a hash of the output plus a record of the generation event (prompt, model, timestamp, operator) in your own database. If a regulator or a journalist asks "did your system generate this?", you can answer from your logs even if the file has been stripped of every external marking. The Code of Practice second draft describes this as the fingerprinting and logging fallback.

Third, wire your UX to display C2PA content credentials when they are present in inbound content. If you run a platform that accepts uploads, preserving provenance is a deployer obligation in the Code of Practice framing. Stripping or obscuring credentials on upload is called out as unacceptable.

Penalties, Omnibus uncertainty, and the August 2 deadline

Article 99 of the AI Act sets the penalty framework. For Article 50 transparency breaches, the maximum fine is EUR 15 million or 3% of global annual turnover, whichever is higher. For SMEs, the lower of the two thresholds applies. Penalties are not aggregated with GDPR fines for the same factual violation (Article 99(7) forces the authority to consider overlap), but different infringements from the same system can each attract their own fine.

The deadline is 2 August 2026. The Digital Omnibus proposal that is attempting to slip Annex III high-risk deadlines from August 2026 to December 2027 does not touch Article 50. Article 50 obligations stay on the original August date regardless of the Omnibus outcome, and the EDPB-EDPS Joint Opinion 1/2026 on the Omnibus is explicit that the transparency and literacy provisions should not be weakened.

The realistic enforcement pattern, given the Code of Practice finalisation in June 2026, is that national authorities will look for good-faith implementation of the second-draft guidance during the second half of 2026 rather than full-form compliance from day one. But "good faith" is not "nothing." A provider that has not shipped C2PA for images by August 2, where the library is open-source and the integration is an afternoon of work, will not be in good faith.

Before your August 2 deadline

Three weeks of preparation buys you the defensible position. One week if you already sign images.

Start with C2PA for images, audio, and video. If you ship synthetic images and they do not carry content credentials, this is the cheapest and highest-leverage fix in the entire compliance programme. Integrate the c2pa-python library, wire signing through your KMS, and write the verification path for inbound content. Preserve credentials on re-upload.

Draft your text position before the Code of Practice finalises in June. If you are a provider, document which text watermarking approach you will deploy (SynthID-Text for first-party models, the provider's approach where you call an API) and acknowledge the paraphrase gap in writing. If you are a deployer publishing AI-generated text, draft the visible disclosure language and the internal classification procedure for "matter of public interest" so you are not arguing about scope in August.

Wire the visible labelling UX for the Code of Practice icon on whichever modalities you ship. The exact icon is still being finalised but the display rules are stable: fixed icon on images, persistent icon on real-time video, opening disclaimer plus end credits on non-real-time video, audible disclaimer on audio, fixed marking at first exposure on text. Build the component library with those slots now and swap the icon in when the Commission publishes the final version.

Log provenance operator-side for every generation event. Hash, prompt metadata, model, timestamp, operator. Store the log for at least the retention period required by your other AI Act documentation.

Ask every AI provider in your stack, in writing, for their Article 50(2) marking plan. Document the answer in your Article 50 compliance file. Their plan is the plan you inherit when you deploy their output, and their plan will be audited whether they told you or not.

Key takeaway

Article 50 applies on 2 August 2026 and the Code of Practice finalises in early June, giving you roughly two months of runway. C2PA for images, audio, and video is the solved case: the libraries are production, the major providers already ship it, and the integration is an afternoon. Text is the hard case: SynthID-Text exists and is the right first layer, but paraphrase and back-translation attacks mean it is not sufficient on its own, and the honest 2026 position is to pair it with provenance logging and visible Article 50(4) disclosure wherever the publication is on a matter of public interest. Fines run to EUR 15 million or 3% of global turnover, and the Digital Omnibus does not move this deadline. Start with images today, draft the text position before June, and ask every provider in your stack for their marking plan in writing.

TLDR

Article 50 of the EU AI Act applies on 2 August 2026. Providers of generative AI systems must mark outputs in a machine-readable format (Article 50(2)). Deployers must label deepfakes and AI-generated text on matters of public interest (Article 50(4)).
C2PA is the working standard for images, audio, and video. Every major provider (OpenAI, Google, Meta, Adobe, Midjourney) already ships C2PA content credentials, and the Python and Rust libraries are production-ready. If you ship images and you do not sign them with C2PA, this is the fix that is cheapest and most defensible.
Text is the genuinely hard case. Google's SynthID-Text is production in the sense that it ships in Gemini and the token-level watermarking scheme is open-sourced in the Responsible GenAI Toolkit, but 2025-2026 research has shown it is vulnerable to paraphrase, copy-paste modification, and back-translation attacks. I am not convinced any current text watermarking method meets the Article 50(2) standard of "effective, interoperable, robust and reliable" under adversarial conditions.
The Code of Practice on marking and labelling AI-generated content is on its second draft (5 March 2026) and finalisation is expected by early June 2026. Feedback on the second draft closed 30 March. The second draft is more streamlined than the first and reduces the compliance burden, but the core requirement of multiple redundant techniques stays.
Fines for Article 50 breaches run up to EUR 15 million or 3% of global turnover (the higher), with the lower threshold for SMEs. Start C2PA for images now. Draft your text position before June. Do not plan around an Omnibus delay.

Where the obligations land: Article 50(1), 50(2), 50(4)

Article 50 is the transparency backbone of the AI Act. It splits responsibilities across three sub-articles, and the split matters because a single system can trigger any combination of them.

What C2PA actually gives you (and what it doesn't)

Here is the minimal Python signing flow:

import json
from c2pa import Builder, Context, Signer, C2paSignerInfo, C2paSigningAlg

manifest = json.dumps({
    "claim_generator": "your_app/1.0",
    "assertions": [{
        "label": "c2pa.actions",
        "data": {
            "actions": [{
                "action": "c2pa.created",
                "digitalSourceType":
                    "http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia"
            }]
        }
    }]
})

with open("cert.pem", "rb") as cert, open("key.pem", "rb") as key:
    signer_info = C2paSignerInfo(
        alg=C2paSigningAlg.PS256,
        sign_cert=cert.read(),
        private_key=key.read(),
        ta_url=b"http://timestamp.digicert.com"
    )
    with Context() as ctx:
        with Signer.from_info(signer_info) as signer:
            with Builder(manifest, ctx) as builder:
                builder.sign_file("input.jpg", "output.jpg", signer)

Verification is a few lines:

from c2pa import Reader, Context

with Context() as ctx:
    with Reader("output.jpg", context=ctx) as reader:
        print(reader.json())

Tip

The text watermarking problem in 2026

Watch out

Note

What the Code of Practice second draft specifies

Visible labelling, deepfakes, and the editorial review exception

Visible labelling is primarily the deployer's job under Article 50(4). The deployer is the entity that puts the AI content in front of the end user, and the disclosure goes there.

When the metadata gets stripped

Three things you can do to improve your posture against stripping.

Penalties, Omnibus uncertainty, and the August 2 deadline

Before your August 2 deadline

Three weeks of preparation buys you the defensible position. One week if you already sign images.

Key takeaway

AI-generated content labeling: what Article 50 requires and how to implement it

Where the obligations land: Article 50(1), 50(2), 50(4)

What C2PA actually gives you (and what it doesn't)

The text watermarking problem in 2026

What the Code of Practice second draft specifies

Visible labelling, deepfakes, and the editorial review exception

When the metadata gets stripped

Penalties, Omnibus uncertainty, and the August 2 deadline

Before your August 2 deadline

Continue reading

EU AI Act: what developers who deploy AI features need to do by August 2026

GDPR and the AI Act: where they overlap and where they don't

AI Act: a plain-English overview for dev teams

AI-generated content labeling: what Article 50 requires and how to implement it

Where the obligations land: Article 50(1), 50(2), 50(4)

What C2PA actually gives you (and what it doesn't)

The text watermarking problem in 2026

What the Code of Practice second draft specifies

Visible labelling, deepfakes, and the editorial review exception

When the metadata gets stripped

Penalties, Omnibus uncertainty, and the August 2 deadline

Before your August 2 deadline

Continue reading

EU AI Act: what developers who deploy AI features need to do by August 2026

GDPR and the AI Act: where they overlap and where they don't

AI Act: a plain-English overview for dev teams