An Unforgeable Publicly Verifiable Watermark for Large Language Models
Abstract: Recently, text watermarking algorithms for LLMs have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Our code is available at \href{https://github.com/THU-BPM/unforgeable_watermark}{https://github.com/THU-BPM/unforgeable\_watermark}. Additionally, our algorithm could also be accessed through MarkLLM \citep{pan2024markllm} \footnote{https://github.com/THU-BPM/MarkLLM}.
- Adversarial watermarking transformer: Towards tracing text provenance with data hiding. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 121–140. IEEE, 2021.
- On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
- A pathway towards responsible ai generated content. arXiv preprint arXiv:2303.01325, 2023.
- Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023.
- No language left behind: Scaling human-centered machine translation. arXiv preprint arXiv:2207.04672, 2022.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Radar: Robust ai-text detection via adversarial learning. arXiv preprint arXiv:2307.03838, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023.
- Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593, 2023.
- Who wrote this code? watermarking for code generation. arXiv preprint arXiv:2305.15060, 2023.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
- Roberta: a robustly optimized bert pretraining approach (2019). arXiv preprint arXiv:1907.11692, 364, 1907.
- Results of the wmt14 metrics shared task. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 293–301, 2014.
- Smaller language models are better black-box machine-generated text detectors. arXiv preprint arXiv:2305.09859, 2023.
- OpenAI. Gpt-4 technical report, 2023.
- On the risk of misinformation pollution with large language models. arXiv preprint arXiv:2305.13661, 2023.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156, 2023.
- Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text. arXiv preprint arXiv:2306.05540, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Robust multi-bit natural language watermarking through invariant features. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2092–2115, 2023.
- G3detector: General gpt-generated text detector. arXiv preprint arXiv:2305.12680, 2023.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068, 2022.
- Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.