Papers
Topics
Authors
Recent
Search
2000 character limit reached

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation

Published 6 Oct 2023 in cs.CL | (2310.03991v2)

Abstract: Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design. To address this issue, we propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH), which partitions the semantic space of sentences. The algorithm encodes and LSH-hashes a candidate sentence generated by an LLM, and conducts sentence-level rejection sampling until the sampled sentence falls in watermarked partitions in the semantic embedding space. A margin-based constraint is used to enhance its robustness. To show the advantages of our algorithm, we propose a "bigram" paraphrase attack using the paraphrase that has the fewest bigram overlaps with the original sentence. This attack is shown to be effective against the existing token-level watermarking method. Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In Ira S. Moskowitz (ed.), Information Hiding, pp.  185–200, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg. ISBN 978-3-540-45496-0.
  2. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp.  1–14, Vancouver, Canada, August 2017. Association for Computational Linguistics. doi: 10.18653/v1/S17-2001. URL https://aclanthology.org/S17-2001.
  3. On the possibilities of ai-generated text detection. arXiv preprint arXiv:2304.04736, 2023.
  4. Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp.  380–388, 2002.
  5. C2l: Causally contrastive learning for robust text classification. In AAAI Conference on Artificial Intelligence, 2022.
  6. Undetectable watermarks for language models. ArXiv, abs/2306.09194, 2023.
  7. Machine-generated text: A comprehensive survey of threat models and detection methods. IEEE Access, 2023.
  8. Cert: Contrastive self-supervised learning for language understanding. ArXiv, abs/2005.12766, 2020.
  9. Watermarking conditional text generation for ai detection: Unveiling challenges and a semantic-aware watermark remedy. ArXiv, abs/2307.13808, 2023.
  10. Simcse: Simple contrastive learning of sentence embeddings. arXiv preprint arXiv:2104.08821, 2021. URL https://arxiv.org/abs/2104.08821.
  11. Gltr: Statistical detection and visualization of generated text. arXiv preprint arXiv:1906.04043, 2019.
  12. Watermarking pre-trained language models with backdooring. arXiv preprint arXiv:2210.07543, 2022.
  13. Generating sentences by editing prototypes. Transactions of the Association for Computational Linguistics, 6:437–450, 2018. doi: 10.1162/tacl˙a˙00030. URL https://aclanthology.org/Q18-1031.
  14. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pp.  1735–1742, 2006. doi: 10.1109/CVPR.2006.100.
  15. The White House. FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI, 2023. URL https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/.
  16. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, pp.  604–613, New York, NY, USA, 1998. Association for Computing Machinery. ISBN 0897919629. doi: 10.1145/276698.276876. URL https://doi.org/10.1145/276698.276876.
  17. Automatic detection of generated text is easiest when humans are fooled. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  1808–1822, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.164. URL https://aclanthology.org/2020.acl-main.164.
  18. Preventing verbatim memorization in language models gives a false sense of privacy. arXiv preprint arXiv:2210.17546, 2022. URL https://arxiv.org/abs/2210.17546.
  19. Automatic detection of machine generated text: A critical survey. In Proceedings of the 28th International Conference on Computational Linguistics, pp.  2296–2309, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. doi: 10.18653/v1/2020.coling-main.208. URL https://aclanthology.org/2020.coling-main.208.
  20. Self-guided contrastive learning for bert sentence representations. ArXiv, abs/2106.07345, 2021. URL https://api.semanticscholar.org/CorpusID:235422673.
  21. A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023a.
  22. On the reliability of watermarks for large language models, 2023b.
  23. Contrastive self-supervised learning for commonsense reasoning. ArXiv, abs/2005.00669, 2020. URL https://api.semanticscholar.org/CorpusID:218486877.
  24. Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408, 2023.
  25. Robust distortion-free watermarks for language models. ArXiv, abs/2307.15593, 2023.
  26. Pair-level supervised contrastive learning for natural language inference. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  8237–8241, 2022. URL https://api.semanticscholar.org/CorpusID:246285518.
  27. A private watermark for large language models. ArXiv, abs/2307.16230, 2023a.
  28. Coco: Coherence-enhanced machine-generated text detection under data limitation with contrastive learning. ArXiv, abs/2212.10341, 2022.
  29. Watermarking text data on large language models for dataset copyright protection. ArXiv, abs/2305.13257, 2023b.
  30. An efficient framework for learning sentence representations. arXiv preprint arXiv:1803.02893, 2018.
  31. Smaller language models are better black-box machine-generated text detectors. ArXiv, abs/2305.09859, 2023.
  32. Detectgpt: Zero-shot machine-generated text detection using probability curvature. In International Conference on Machine Learning, 2023. URL https://api.semanticscholar.org/CorpusID:256274849.
  33. OpenAI. ChatGPT, 2022. URL https://openai.com/blog/chatgpt.
  34. OpenAI. GPT-4 Technical Report, 2023. URL https://arxiv.org/abs/2303.08774.
  35. Threat scenarios and best practices to detect neural fake news. In Proceedings of the 29th International Conference on Computational Linguistics, pp.  1233–1249, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics. URL https://aclanthology.org/2022.coling-1.106.
  36. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research (JMLR), 2020. URL https://arxiv.org/abs/1910.10683.
  37. Randomized algorithms and NLP: Using locality sensitive hash functions for high speed noun clustering. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp.  622–629, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. doi: 10.3115/1219840.1219917. URL https://aclanthology.org/P05-1077.
  38. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  3982–3992, Hong Kong, China, November 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1410. URL https://aclanthology.org/D19-1410.
  39. Can ai-generated text be reliably detected?, 2023.
  40. Release strategies and the social impacts of language models. ArXiv, abs/1908.09203, 2019.
  41. Detectllm: Leveraging log rank information for zero-shot detection of machine-generated text. arXiv preprint arXiv:2306.05540, 2023.
  42. Online generation of locality sensitive hash signatures. In Proceedings of the ACL 2010 Conference Short Papers, pp. 231–235, Uppsala, Sweden, July 2010. Association for Computational Linguistics. URL https://aclanthology.org/P10-2043.
  43. Towards codable text watermarking for large language models. ArXiv, abs/2307.15992, 2023.
  44. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359, 2021.
  45. COD3S: Diverse generation with discrete semantic signatures. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  5199–5211, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.421. URL https://aclanthology.org/2020.emnlp-main.421.
  46. Neural text generation with unlikelihood training. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=SJeYe0NtvH.
  47. Paraphrastic representations at scale. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp.  379–388, Abu Dhabi, UAE, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-demos.38.
  48. Promptcare: Prompt copyright protection by watermark injection and verification. ArXiv, abs/2308.02816, 2023.
  49. Robust multi-bit natural language watermarking through invariant features. In Annual Meeting of the Association for Computational Linguistics, 2023.
  50. Defending against neural fake news. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems 32, pp.  9054–9065. Curran Associates, Inc., 2019. URL http://papers.nips.cc/paper/9106-defending-against-neural-fake-news.pdf.
  51. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning (ICML), 2020. URL https://arxiv.org/abs/1912.08777.
  52. Opt: Open pre-trained transformer language models. ArXiv, abs/2205.01068, 2022. URL https://api.semanticscholar.org/CorpusID:248496292.
  53. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations (ICLR), 2019. URL https://openreview.net/forum?id=SkeHuCVFDr.
  54. Generating informative and diverse conversational responses via adversarial information maximization. In NeurIPS, 2018.
  55. Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439, 2023.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.