k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text
Abstract: Recent watermarked generation algorithms inject detectable signatures during language generation to facilitate post-hoc detection. While token-level watermarks are vulnerable to paraphrase attacks, SemStamp (Hou et al., 2023) applies watermark on the semantic representation of sentences and demonstrates promising robustness. SemStamp employs locality-sensitive hashing (LSH) to partition the semantic space with arbitrary hyperplanes, which results in a suboptimal tradeoff between robustness and speed. We propose k-SemStamp, a simple yet effective enhancement of SemStamp, utilizing k-means clustering as an alternative of LSH to partition the embedding space with awareness of inherent semantic structure. Experimental results indicate that k-SemStamp saliently improves its robustness and sampling efficiency while preserving the generation quality, advancing a more effective tool for machine-generated text detection.
- Undetectable watermarks for language models. ArXiv, abs/2306.09194.
- Watermarking conditional text generation for ai detection: Unveiling challenges and a semantic-aware watermark remedy. ArXiv, abs/2307.13808.
- Measuring and improving semantic diversity of dialogue generation. In Findings of the Association for Computational Linguistics: EMNLP 2022.
- Semstamp: A semantic watermark with paraphrastic robustness for text generation. arXiv preprint arXiv:2310.03991.
- Piotr Indyk and Rajeev Motwani. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98, page 604–613, New York, NY, USA. Association for Computing Machinery.
- A watermark for large language models. arXiv preprint arXiv:2301.10226.
- On the reliability of watermarks for large language models.
- Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense. arXiv preprint arXiv:2303.13408.
- Booksum: A collection of datasets for long-form narrative summarization. arXiv preprint arXiv:2105.08209.
- Robust distortion-free watermarks for language models. ArXiv, abs/2307.15593.
- A semantic invariant robust watermark for large language models. arXiv preprint arXiv:2310.06356.
- Seth Lloyd. 1982. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129–137.
- Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT*’19, page 220–229, New York, NY, USA. Association for Computing Machinery.
- OpenAI. 2022. ChatGPT.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research (JMLR).
- Can ai-generated text be reliably detected?
- Towards codable text watermarking for large language models. ArXiv, abs/2307.15992.
- Paraphrastic representations at scale. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 379–388, Abu Dhabi, UAE. Association for Computational Linguistics.
- Robust multi-bit natural language watermarking through invariant features. In Annual Meeting of the Association for Computational Linguistics.
- Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning (ICML).
- OPT: Open Pre-trained Transformer Language Models. arXiv preprint arXiv:2205.01068.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations (ICLR).
- Generating informative and diverse conversational responses via adversarial information maximization. In NeurIPS.
- Provable robust watermarking for ai-generated text. arXiv preprint arXiv:2306.17439.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.