Agentic Copyright Watermarking against Adversarial Evidence Forgery with Purification-Agnostic Curriculum Proxy Learning
Abstract: With the proliferation of AI agents in various domains, protecting the ownership of AI models has become crucial due to the significant investment in their development. Unauthorized use and illegal distribution of these models pose serious threats to intellectual property, necessitating effective copyright protection measures. Model watermarking has emerged as a key technique to address this issue, embedding ownership information within models to assert rightful ownership during copyright disputes. This paper presents several contributions to model watermarking: a self-authenticating black-box watermarking protocol using hash techniques, a study on evidence forgery attacks using adversarial perturbations, a proposed defense involving a purification step to counter adversarial attacks, and a purification-agnostic curriculum proxy learning method to enhance watermark robustness and model performance. Experimental results demonstrate the effectiveness of these approaches in improving the security, reliability, and performance of watermarked models.
- R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2015.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016.
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist. Intervent. (MICCAI), 2015.
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Adv. Neural Inf. Process. Syst. (NeurIPS), 2017.
- A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models are unsupervised multitask learners,” OpenAI Blog, 2019.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. Conf. North American Chapter Assoc. Comput. Linguistics (NAACL), 2019.
- Y. Uchida, Y. Nagai, S. Sakazawa, and S. Satoh, “Embedding watermarks into deep neural networks,” in Proc. ACM Int. Conf. Multimedia Retrieval (ICMR), 2017.
- F. Boenisch, “A systematic review on model watermarking for neural networks,” Frontiers in Big Data, vol. 4, 2021, Art. no. 729663.
- Y. Li, H. Wang, and M. Barni, “A survey of deep neural network watermarking techniques,” Neurocomputing, vol. 461, pp. 171–193, 2021.
- Y. Nagai, Y. Uchida, S. Sakazawa, and S. Satoh, “Digital watermarking for deep neural networks,” Int. J. Multimedia Inf. Retrieval, vol. 7, pp. 3–16, 2018.
- M. Kuribayashi, T. Tanaka, and N. Funabiki, “DeepWatermark: Embedding watermark into DNN model,” in Proc. Asia-Pacific Signal Inf. Process. Assoc. Ann. Summit Conf. (APSIPA ASC), 2020.
- B. Darvish Rouhani, H. Chen, and F. Koushanfar, “DeepSigns: An end-to-end watermarking framework for ownership protection of deep neural networks,” in Proc. Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019.
- J. Zhang, H. Guo, Y. Chen, and X. Chen, “Protecting intellectual property of deep neural networks with watermarking,” in Proc. 2018 Asia Conf. Comput. Commun. Security (AsiaCCS), 2018.
- Y. Adi, C. Baum, M. Cisse, B. Pinkas, and J. Keshet, “Turning your weakness into a strength: Watermarking deep neural networks by backdooring,” in Proc. USENIX Secur. Symp. (USENIX Security), 2018.
- S. Leroux, S. Vanassche, and P. Simoens, “Multi-bit black-box watermarking of deep neural networks in embedded applications,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) Workshops, 2024.
- L. Chi and X. Zhu, “Hashing techniques: A survey and taxonomy,” ACM Comput. Surv., vol. 50, no. 1, pp. 1–36, 2017.
- I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and Harnessing Adversarial Examples,” in Proc. Int. Conf. Learn. Representations (ICLR), 2015.
- A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018.
- F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” in Proc. Int. Conf. Mach. Learn. (ICML), 2020.
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proc. Int. Conf. Learn. Representations (ICLR), 2014.
- X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial Examples: Attacks and Defenses for Deep Learning,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 9, pp. 2805-2824, 2019.
- N. Akhtar and A. Mian, “Threat of adversarial attacks on deep learning in computer vision: A survey,” IEEE Access, vol. 6, pp. 14410–14430, 2018.
- C. Guo, M. Rana, M. Cisse, and L. van der Maaten, “Countering adversarial images using input transformations,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018.
- F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu, “Defense against adversarial attacks using high-level representation guided denoiser,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018.
- P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-GAN: Protecting classifiers against adversarial attacks using generative models,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018.
- Y. Song, T. Kim, S. Nowozin, S. Ermon, and N. Kushman, “PixelDefend: Leveraging generative models to understand and defend against adversarial examples,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018.
- E. Bao, C.-C. Chang, H. H. Nguyen, and I. Echizen, “From deconstruction to reconstruction: A plug-in module for diffusion-based purification of adversarial examples,” in Proc. Int. Workshops Digital-Forensics and Watermarking (IWDW), 2023.
- Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in Proc. Int. Conf. Mach. Learn. (ICML), 2015.
- A. Krizhevsky, “Learning multiple layers of features from tiny images,” Master’s thesis, University of Toronto, 2009.
- K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016.
- W. Nie, B. Guo, Y. Huang, C. Xiao, A. Vahdat, and A. Anandkumar, “Diffusion models for adversarial purification,” in Proc. Int. Conf. Mach. Learn. (ICML), 2022.
- M. Bellare and P. Rogaway, “Random oracles are practical: A paradigm for designing efficient protocols,” in Proc. ACM Conf. Comput. Commun. Security (CCS), 1993.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.