Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards Better Statistical Understanding of Watermarking LLMs

Published 19 Mar 2024 in cs.CR, cs.IT, cs.LG, math.IT, and stat.ML | (2403.13027v1)

Abstract: In this paper, we study the problem of watermarking LLMs. We consider the trade-off between model distortion and detection ability and formulate it as a constrained optimization problem based on the green-red algorithm of Kirchenbauer et al. (2023a). We show that the optimal solution to the optimization problem enjoys a nice analytical property which provides a better understanding and inspires the algorithm design for the watermarking process. We develop an online dual gradient ascent watermarking algorithm in light of this optimization formulation and prove its asymptotic Pareto optimality between model distortion and detection ability. Such a result guarantees an averaged increased green list probability and henceforth detection ability explicitly (in contrast to previous results). Moreover, we provide a systematic discussion on the choice of the model distortion metrics for the watermarking problem. We justify our choice of KL divergence and present issues with the existing criteria of ``distortion-free'' and perplexity. Finally, we empirically evaluate our algorithms on extensive datasets against benchmark algorithms.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Aaronson, Scott. 2023. Watermarking of large language models. URL https://simons.berkeley.edu/talks/scott-aaronson-ut-austin-openai-2023-08-17.
  2. Fast algorithms for online stochastic convex programming. Proceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 1405–1424.
  3. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. Information Hiding: 4th International Workshop, IH 2001 Pittsburgh, PA, USA, April 25–27, 2001 Proceedings 4. Springer, 185–200.
  4. Natural language watermarking and tamperproofing. International workshop on information hiding. Springer, 196–212.
  5. Beresneva, Daria. 2016. Computer-generated text detection using machine learning: A systematic review. Natural Language Processing and Information Systems: 21st International Conference on Applications of Natural Language to Information Systems, NLDB 2016, Salford, UK, June 22-24, 2016, Proceedings 21. Springer, 421–426.
  6. Convex optimization. Cambridge university press.
  7. Estimation des densités: risque minimax. Séminaire de probabilités de Strasbourg 12 342–363.
  8. Time series: theory and methods. Springer-Verlag, Berlin, Heidelberg.
  9. Language models are few-shot learners. Advances in neural information processing systems 33 1877–1901.
  10. On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736 .
  11. Natural language watermarking using semantic substitution for chinese text. Digital Watermarking: Second International Workshop, IWDW 2003, Seoul, Korea, October 20-22, 2003. Revised Papers 2. Springer, 129–140.
  12. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194 .
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 .
  14. Eli5: Long form question answering. arXiv preprint arXiv:1907.09190 .
  15. Three bricks to consolidate watermarks for large language models. 2023 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 1–6.
  16. Gltr: Statistical detection and visualization of generated text. arXiv preprint arXiv:1906.04043 .
  17. Hoeffding, Wassily. 1994. Probability inequalities for sums of bounded random variables. The collected works of Wassily Hoeffding 409–426.
  18. Automatic detection of generated text is easiest when humans are fooled. arXiv preprint arXiv:1911.00650 .
  19. Information hiding: steganography and watermarking-attacks and countermeasures: steganography and watermarking: attacks and countermeasures, vol. 1. Springer Science & Business Media.
  20. A watermark for large language models. arXiv preprint arXiv:2301.10226 .
  21. On the reliability of watermarks for large language models. arXiv preprint arXiv:2306.04634 .
  22. Robust distortion-free watermarks for language models. arXiv preprint arXiv:2307.15593 .
  23. Le Cam, Lucien. 2012. Asymptotic methods in statistical decision theory. Springer Science & Business Media.
  24. Simple and fast algorithm for binary integer and online linear programming. Advances in Neural Information Processing Systems 33 9412–9421.
  25. Gpt detectors are biased against non-native english writers. arXiv preprint arXiv:2304.02819 .
  26. Non-stationary bandits with knapsacks. Advances in Neural Information Processing Systems 35 16522–16532.
  27. Natural language watermarking via morphosyntactic alterations. Computer Speech & Language 23(1) 107–125.
  28. Detectgpt: Zero-shot machine-generated text detection using probability curvature. International Conference on Machine Learning. PMLR, 24950–24962.
  29. Online convex optimization with time-varying constraints. arXiv preprint arXiv:1702.04783 .
  30. Top tech firms commit to ai safeguards amid fears over pace of change. The Guardian URL https://www.theguardian.com/technology/2023/jul/21/ai-ethics-guidelines-google-meta-amazon.
  31. Lecture notes on information theory. Lecture Notes for ECE563 (UIUC) and 6(2012-2016) 7.
  32. Robust speech recognition via large-scale weak supervision. International Conference on Machine Learning. PMLR, 28492–28518.
  33. Language models are unsupervised multitask learners. OpenAI blog 1(8) 9.
  34. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21(1) 5485–5551.
  35. Can ai-generated text be reliably detected? arXiv preprint arXiv:2303.11156 .
  36. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203 .
  37. Tian, Edward. 2023. Gptzero update v1. URL https://gptzero.substack.com/p/gptzero-update-v1.
  38. Natural language watermarking: Challenges in building a practical system. Security, Steganography, and Watermarking of Multimedia Contents VIII, vol. 6072. SPIE, 106–117.
  39. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 .
  40. Attention is all you need. Advances in neural information processing systems 30.
  41. Watermarking the outputs of structured prediction with an application in statistical machine translation. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. 1363–1372.
  42. Evaluation methods for topic models. Proceedings of the 26th annual international conference on machine learning. 1105–1112.
  43. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 .
  44. Wouters, Bram. 2023. Optimizing watermarks for large language models. arXiv preprint arXiv:2312.17295 .
  45. Yellott Jr, John I. 1977. The relationship between luce’s choice axiom, thurstone’s theory of comparative judgment, and the double exponential distribution. Journal of Mathematical Psychology 15(2) 109–144.
  46. Defending against neural fake news. Advances in neural information processing systems 32.
  47. Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439 .
  48. Zinkevich, Martin. 2003. Online convex programming and generalized infinitesimal gradient ascent. Proceedings of the 20th international conference on machine learning (icml-03). 928–936.
Citations (5)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 3 likes about this paper.