Invisible Ink That Washes Off

This morning OpenAI announced that it is joining the C2PA open standard and embedding Google DeepMind’s SynthID invisible watermarks into its AI-generated images. There is now a public verification portal where anyone can upload an image and find out whether it was generated by OpenAI’s models. By the afternoon, a Python CLI called remove-ai-watermarks had surfaced on Hacker News that, according to its author’s manual testing, defeats SynthID v2.

The obvious read is that this is just another arms race in a long series of them. The more careful read is that both things can be true simultaneously: SynthID can be defeatable by a motivated adversary, and SynthID can still be useful. The question is what “useful” means for a content provenance system.

What the Two Layers Actually Do

C2PA and SynthID serve different threat models, and conflating them produces confused expectations about what either can accomplish.

C2PA is a metadata standard: it attaches a cryptographically signed provenance manifest to a file that records who created it, with what tools, and when. This manifest survives intact as long as the file travels through systems that preserve metadata — professional photography workflows, news wires, stock photo agencies, authenticated publishing pipelines. It breaks immediately if you screenshot the image, re-export it, or strip EXIF data, which is what almost every casual resharing does. C2PA is excellent for authenticated provenance chains but essentially irrelevant for social media virality.

SynthID addresses the gap. Instead of attaching metadata, it embeds a watermark directly into the image content — a frequency-domain perturbation tuned to be imperceptible to humans but detectable by the model. Because it’s in the pixels rather than the metadata, it survives cropping, color adjustments, format conversion, and most common edits. It’s not tied to a cryptographic chain of custody; it just marks the content.

Together, the two layers cover complementary failure modes. C2PA fails when metadata is stripped but content is preserved. SynthID fails when content is sufficiently altered — which turns out to include, per the remove-ai-watermarks tool, a controlled round-trip through SDXL diffusion.

How the Removal Works

The invisible watermark removal pipeline in the CLI is instructive. It encodes the image into SDXL’s latent space using a VAE, adds a small amount of controlled noise, runs approximately 50 denoising steps at a low strength (0.05), and decodes back to pixels. The regenerated image looks nearly identical to the original — the face detection logic even re-blends human faces to prevent distortion — but the frequency-domain watermark has been scrambled by the diffusion process.

This isn’t clever hacking of SynthID’s specific implementation; it’s exploiting a structural tension in how invisible watermarks work. The watermark has to be subtle enough that it doesn’t visibly degrade the image, which limits how much signal can be embedded. A diffusion process that regenerates most of the image content will obliterate a watermark of this strength by construction.

SynthID’s designers know this. The watermark is not designed to resist adversarial removal with a GPU and 50 diffusion steps. It’s designed to survive the lazy resharing that accounts for almost all misinformation spread: screenshots, format conversions, compressed uploads, basic color grading. For that population of transformations, it works well.

What This System Is Actually For

The relevant question for a content provenance system isn’t whether it can be defeated by someone willing to run diffusion on every image they want to spread. It’s whether it raises the cost and reduces the scale of the problem it’s designed to address.

The adversarial case — a motivated actor who wants to create and spread AI-generated disinformation — already operates outside the bounds of any voluntary marking system. They will not use OpenAI’s image generation with watermarks intact. They will use open-source models with watermarking disabled, or run the removal tool, or generate content on platforms that don’t participate in C2PA at all. Neither SynthID nor C2PA solves this threat.

What the system addresses is the middle case: platforms that want to label AI-generated content accurately, researchers and journalists trying to verify the origin of a specific image, and organizations building provenance infrastructure for their own content pipelines. For these use cases, the verification portal and the dual-layer approach are genuinely useful, and the fact that a removal tool exists doesn’t undermine them much — the people using the verification portal for legitimate purposes aren’t the ones running SynthID removal scripts.

The harder problem is that provenance systems only work if they’re universal, and universality requires buy-in that doesn’t exist yet. An OpenAI-only verification portal doesn’t tell you that an image wasn’t made by a tool that doesn’t participate. The positive signal is informative; the absence of a signal is not. That’s a structural limitation that no amount of watermark robustness can fix.