Google has released a tool called SynthID Text to allow developers to watermark and detect text generated by AI models. This technology aims to address the mass spread of AI-generated content across various platforms, including news articles, academic papers, and social media.
The tool is now available for download from the AI platform Hugging Face and Google’s updated Responsible GenAI Toolkit. Google announced the open-source nature of SynthID Text in a post on X, stating, “We’re open-sourcing our SynthID Text watermarking tool. Available freely to developers and businesses, it will help them identify their AI-generated content.”
How SynthID Text Works
SynthID Text works by subtly modifying the text generation process of AI models. When a prompt is given, the the text-generating model predicts which “token” is most likely to follow another, generating one token at a time. Tokens can be individual characters or words, serving as the fundamental units that the generative model uses to work with information.
The model assigns a score to each possible token, indicating the likelihood of its inclusion in the output text. SynthID Text tweaks this process by adjusting the likelihood of certain tokens being generated. According to Google, “The final pattern of scores for both the model’s word choices combined with the adjusted probability scores are considered the watermark.” This pattern is then compared with expected patterns for watermarked and unwatermarked text, enabling the detection of AI-generated content.
Google claims that SynthID Text does not compromise the quality, accuracy, or speed of text generation. The tool has been integrated with Google’s Gemini models since earlier this year. However, the company says that the watermarking approach has limitations. For instance, it may not be that effective with short texts or those that have been significantly rewritten or translated from another language.
The Need for Watermarking Technology
The rise of AI-generated content has led to a surge in tools designed to spot such text. Some companies offer services that analyze material to determine if it was generated by AI, while others claim to “humanize” AI-generated text to make it harder to detect. The effectiveness of these tools is often questioned, especially as AI chatbots continue to improve.
The introduction of watermarking technology like SynthID Text represents a proactive approach to the problem. Pushmeet Kohli, vice president of research at Google DeepMind, stated, “While SynthID isn’t a silver bullet for identifying AI-generated content, it is an important building block for developing more reliable AI identification tools.”
Independent researchers are optimistic about the potential of watermarking systems. Scott Aaronson from the University of Texas at Austin noted, “While no known watermarking method is foolproof, I really think this can help in catching some fraction of AI-generated misinformation, academic cheating and more.”
Industry Response and Future Implications
Experts working on content credentials view the research as a significant step forward. Andrew Jenks, Microsoft’s director of media provenance, remarked that the technology “holds promise for improving the use of durable content credentials from C2PA for documents and raw text.”
Despite the progress made in detecting AI generated content, the problem is far from resolved. The effectiveness of watermarking technology depends on widespread adoption across the industry. There is a call for interoperability, where a single detector could identify text from multiple large language models (LLMs). Bruce MacCormack, a member of the C2PA steering committee, emphasized the practical challenges of implementing such systems, stating, “There are challenges with the review of text in the wild, where you would have to know which watermarking model has been applied to know how and where to look for the signal.”
According to the European Union Law Enforcement Agency 90% of online content could be synthetically generated by 2026. The rising tide of AI-made content will therefore pose a serious challenge for law enforcement when it comes to disinformation, propaganda, and fraud.
In response to these concerns, some governments are considering regulations. China has introduced mandatory watermarking for AI-generated content, and California is exploring similar measures.
With the further development of the tech, focus will likely shift toward ensuring watermarking is robust and widely adopted, providing a reliable way to distinguishing between human and AI-generated text.
(Photo by Raphael Schaller on Unsplash)