Provable Robust Watermarking for AI-Generated Text
Abstract: We study the problem of watermarking LLMs generated text -- one of the most promising approaches for addressing the safety challenges of LLM usage. In this paper, we propose a rigorous theoretical framework to quantify the effectiveness and robustness of LLM watermarks. We propose a robust and high-quality watermark method, Unigram-Watermark, by extending an existing approach with a simplified fixed grouping strategy. We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing. Experiments on three varying LLMs and two datasets verify that our Unigram-Watermark achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs. Code is available at https://github.com/XuandongZhao/Unigram-Watermark.
- A watermark for large language models. International Conference on Machine Learning, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Hierarchical text-conditional image generation with clip latents. ArXiv, abs/2204.06125, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
- OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023a.
- OpenAI. Chatgpt: Optimizing language models for dialogue. OpenAI blog, 2022. URL https://openai.com/blog/chatgpt/.
- Defending against neural fake news. Advances in neural information processing systems, 32, 2019.
- Ethical and social risks of harm from language models. ArXiv, abs/2112.04359, 2021.
- Chris Stokel-Walker. Ai bot chatgpt writes smart essays - should professors worry? Nature, 2022.
- Robust speech recognition via large-scale weak supervision. ArXiv, abs/2212.04356, 2022.
- Poisoning web-scale training datasets is practical. ArXiv, abs/2302.10149, 2023.
- Alan M Turing. Computing machinery and intelligence. 1950.
- Gltr: Statistical detection and visualization of generated text. In Annual Meeting of the Association for Computational Linguistics, 2019.
- Detectgpt: Zero-shot machine-generated text detection using probability curvature. ArXiv, abs/2301.11305, 2023.
- Dirk Hovy. The enemy in your own camp: How well can we detect statistically-generated fake reviews – an adversarial study. In Annual Meeting of the Association for Computational Linguistics, 2016.
- OpenAI. New ai classifier for indicating ai-written text. OpenAI blog, 2023b. URL https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text.
- Gpt detectors are biased against non-native english writers. ArXiv, abs/2304.02819, 2023.
- Protecting language generation models via invisible watermarking. ArXiv, abs/2302.03162, 2023.
- Scott Aaronson. Simons institute talk on watermarking of large language models, 2023. URL https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17.
- Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
- Mélisande Albert. Concentration inequalities for randomly permuted sums. In High Dimensional Probability VIII: The Oaxaca Volume, pages 341–383. Springer, 2019.
- Language models are unsupervised multitask learners. OpenAI blog, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.
- Calibrating noise to sensitivity in private data analysis. In Theory of cryptography, pages 265–284. Springer, 2006.
- Optimal differential privacy composition for exponential mechanisms. In International Conference on Machine Learning, pages 2597–2606. PMLR, 2020.
- Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. ArXiv, abs/2303.13408, 2023.
- Pointer sentinel mixture models. In International Conference on Learning Representations, 2017.
- Opt: Open pre-trained transformer language models. ArXiv, abs/2205.01068, 2022.
- Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023.
- The curious case of neural text degeneration. In International Conference on Learning Representations, 2020.
- Huggingface’s transformers: State-of-the-art natural language processing. ArXiv, abs/1910.03771, 2019.
- Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155, 2022.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Annual Meeting of the Association for Computational Linguistics, 2019.
- Teaching machines to read and comprehend. Advances in neural information processing systems, 28, 2015.
- Information hiding techniques for steganography and digital watermarking, 2000.
- The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Workshop on Multimedia & Security, 2006.
- Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Information Hiding, 2001.
- Natural language watermarking and tamperproofing. In Information Hiding, 2002.
- Tracing text provenance via context-aware lexical substitution. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11613–11621, 2022.
- Frustratingly easy edit-based linguistic steganography with a masked language model. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
- On pushing deepfake tweet detection capabilities to the limits. Proceedings of the 14th ACM Web Science Conference 2022, 2022.
- Max Wolff. Attacking neural text detectors. ArXiv, abs/2002.11768, 2020.
- Can ai-generated text be reliably detected? ArXiv, abs/2303.11156, 2023.
- On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Mechanism design via differential privacy. In Symposium on Foundations of Computer Science (FOCS’07), pages 94–103. IEEE, 2007.
- Bounding, concentrating, and truncating: Unifying privacy loss composition for data analytics. In Algorithmic Learning Theory, pages 421–457. PMLR, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.