Papers
Topics
Authors
Recent
Search
2000 character limit reached

Optimizing watermarks for large language models

Published 28 Dec 2023 in cs.CR, cs.AI, and cs.CL | (2312.17295v1)

Abstract: With the rise of LLMs and concerns about potential misuse, watermarks for generative LLMs have recently attracted much attention. An important aspect of such watermarks is the trade-off between their identifiability and their impact on the quality of the generated text. This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization problem. For a large class of robust, efficient watermarks, the associated Pareto optimal solutions are identified and shown to outperform the currently default watermark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Aaronson, S. My AI Safety Lecture for UT Effective Altruism, 2022. URL https://scottaaronson.blog/?p=6823.
  2. Adversarial Watermarking Transformer: Towards Tracing Text Provenance with Data Hiding. 2021 IEEE Symposium on Security and Privacy (SP), 00:121–140, 2021. doi: 10.1109/sp40001.2021.00083.
  3. Performance Trade-offs of Watermarking Large Language Models. arXiv, 2023. doi: 10.48550/arxiv.2311.09816. URL https://arxiv.org/abs/2311.09816.
  4. OpenAI, Google, others pledge to watermark AI content for safety, White House says, 2023. URL https://www.reuters.com/technology/openai-google-others-pledge-watermark-ai-content-safety-white-house-2023-07-21/.
  5. X-Mark: Towards Lossless Watermarking Through Lexical Redundancy. arXiv, 2023. doi: 10.48550/arxiv.2311.09832. URL https://arxiv.org/abs/2311.09832.
  6. Undetectable Watermarks for Language Models. arXiv, 2023. doi: 10.48550/arxiv.2306.09194. URL https://arxiv.org/abs/2306.09194.
  7. COSYWA: Enhancing Semantic Integrity in Watermarking Natural Language Generation. Lecture Notes in Computer Science, pp.  708–720, 2023. ISSN 0302-9743. doi: 10.1007/978-3-031-44693-1_55.
  8. Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy. arXiv, 2023. doi: 10.48550/arxiv.2307.13808. URL https://arxiv.org/abs/2307.13808.
  9. On pushing DeepFake Tweet Detection capabilities to the limits. 14th ACM Web Science Conference 2022, pp.  154–163, 2022. doi: 10.1145/3501247.3531560.
  10. Generative Language Models and Automated Influence Operations: Emerging Threats and Potential Mitigations. arXiv, 2023. doi: 10.48550/arxiv.2301.04246. URL https://arxiv.org/abs/2301.04246.
  11. On the Learnability of Watermarks for Language Models. arXiv, 2023. doi: 10.48550/arxiv.2312.04469. URL https://arxiv.org/abs/2312.04469.
  12. Unbiased Watermark for Large Language Models. arXiv, 2023. doi: 10.48550/arxiv.2310.10669. URL https://arxiv.org/abs/2310.10669.
  13. Evading Watermark based Detection of AI-Generated Content. arXiv, 2023. doi: 10.48550/arxiv.2305.03807. URL https://arxiv.org/abs/2305.03807.
  14. A Watermark for Large Language Models. arXiv, 2023a. doi: 10.48550/arxiv.2301.10226. URL https://arxiv.org/abs/2301.10226.
  15. On the Reliability of Watermarks for Large Language Models. arXiv, 2023b. doi: 10.48550/arxiv.2306.04634. URL https://arxiv.org/abs/2306.04634.
  16. Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense. arXiv, 2023. doi: 10.48550/arxiv.2303.13408. URL https://arxiv.org/abs/2303.13408.
  17. Robust Distortion-free Watermarks for Language Models. arXiv, 2023. doi: 10.48550/arxiv.2307.15593. URL https://arxiv.org/abs/2307.15593.
  18. Who Wrote this Code? Watermarking for Code Generation. arXiv, 2023. doi: 10.48550/arxiv.2305.15060. URL https://arxiv.org/abs/2305.15060.
  19. Watermarking LLMs with Weight Quantization. arXiv, 2023a. doi: 10.48550/arxiv.2310.11237. URL https://arxiv.org/abs/2310.11237.
  20. Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring. arXiv, 2023b. doi: 10.48550/arxiv.2311.09668. URL https://arxiv.org/abs/2311.09668.
  21. A Private Watermark for Large Language Models. arXiv, 2023a. doi: 10.48550/arxiv.2307.16230. URL https://arxiv.org/abs/2307.16230.
  22. A Semantic Invariant Robust Watermark for Large Language Models. arXiv, 2023b. doi: 10.48550/arxiv.2310.06356. URL https://arxiv.org/abs/2310.06356.
  23. ChatGPT and large language models in academia: opportunities and challenges. BioData Mining, 16(1):20, 2023. ISSN 1756-0381. doi: 10.1186/s13040-023-00339-9.
  24. Large language models challenge the future of higher education. Nature Machine Intelligence, 5(4):333–334, 2023. doi: 10.1038/s42256-023-00644-2.
  25. DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. arXiv, 2023. doi: 10.48550/arxiv.2301.11305. URL https://arxiv.org/abs/2301.11305.
  26. DeepTextMark: Deep Learning based Text Watermarking for Detection of Large Language Model Generated Text. arXiv, 2023. doi: 10.48550/arxiv.2305.05773. URL https://arxiv.org/abs/2305.05773.
  27. Mark My Words: Analyzing and Evaluating Language Model Watermarks. arXiv, 2023. doi: 10.48550/arxiv.2312.00273. URL https://arxiv.org/abs/2312.00273.
  28. Natural language watermarking via paraphraser-based lexical substitution. Artificial Intelligence, 317:103859, 2023. ISSN 0004-3702. doi: 10.1016/j.artint.2023.103859.
  29. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21, 2020. ISSN 1532-4435. doi: 10.5555/3455716.3455856.
  30. Risks and Benefits of Large Language Models for the Environment. Environmental Science & Technology, 57(9):3464–3466, 2023. ISSN 0013-936X. doi: 10.1021/acs.est.3c01106.
  31. Can AI-Generated Text be Reliably Detected? arXiv, 2023. doi: 10.48550/arxiv.2303.11156. URL https://arxiv.org/abs/2303.11156.
  32. Red Teaming Language Model Detectors with Language Models. arXiv, 2023. doi: 10.48550/arxiv.2305.19713. URL https://arxiv.org/abs/2305.19713.
  33. Baselines for Identifying Watermarked Large Language Models. arXiv, 2023. doi: 10.48550/arxiv.2305.18456. URL https://arxiv.org/abs/2305.18456.
  34. WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models. arXiv, 2023. doi: 10.48550/arxiv.2311.07138. URL https://arxiv.org/abs/2311.07138.
  35. Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp.  1363—1372, 2011. URL https://aclanthology.org/D11-1126.
  36. Vincent, J. AI-generated answers temporarily banned on coding Q&A site Stack Overflow, 2022. URL https://www.theverge.com/2022/12/5/23493932/chatgpt-ai-generated-answers-temporarily-banned-stack-overflow-llms-dangers.
  37. Towards Codable Text Watermarking for Large Language Models. arXiv, 2023. doi: 10.48550/arxiv.2307.15992. URL https://arxiv.org/abs/2307.15992.
  38. DiPmark: A Stealthy, Efficient and Resilient Watermark for Large Language Models. arXiv, 2023. doi: 10.48550/arxiv.2310.07710. URL https://arxiv.org/abs/2310.07710.
  39. Watermarking Text Generated by Black-Box Language Models. arXiv, 2023. doi: 10.48550/arxiv.2305.08883. URL https://arxiv.org/abs/2305.08883.
  40. Robust Multi-bit Natural Language Watermarking through Invariant Features. arXiv, 2023. doi: 10.48550/arxiv.2305.01904. URL https://arxiv.org/abs/2305.01904.
  41. Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models. arXiv, 2023.
  42. OPT: Open Pre-trained Transformer Language Models. arXiv, 2022. doi: 10.48550/arxiv.2205.01068. URL https://arxiv.org/abs/2205.01068.
  43. Provable Robust Watermarking for AI-Generated Text. arXiv, 2023. doi: 10.48550/arxiv.2306.17439. URL https://arxiv.org/abs/2306.17439.
  44. Neural Linguistic Steganography. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  1210–1215, 2019. doi: 10.18653/v1/d19-1115.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.