LabObf: A Label Protection Scheme for Vertical Federated Learning Through Label Obfuscation
Abstract: Split Neural Network, as one of the most common architectures used in vertical federated learning, is popular in industry due to its privacy-preserving characteristics. In this architecture, the party holding the labels seeks cooperation from other parties to improve model performance due to insufficient feature data. Each of these participants has a self-defined bottom model to learn hidden representations from its own feature data and uploads the embedding vectors to the top model held by the label holder for final predictions. This design allows participants to conduct joint training without directly exchanging data. However, existing research points out that malicious participants may still infer label information from the uploaded embeddings, leading to privacy leakage. In this paper, we first propose an embedding extension attack manipulating embeddings to undermine existing defense strategies, which rely on constraining the correlation between the embeddings uploaded by participants and the labels. Subsequently, we propose a new label obfuscation defense strategy, called `LabObf', which randomly maps each original integer-valued label to multiple real-valued soft labels with values intertwined, significantly increasing the difficulty for attackers to infer the labels. We conduct experiments on four different types of datasets, and the results show that LabObf significantly reduces the attacker's success rate compared to raw models while maintaining desirable model accuracy.
- F. Tang, S. Liang, G. Ling, and J. Shan, “Ihvfl: a privacy-enhanced intention-hiding vertical federated learning framework for medical data,” Cybersecurity, vol. 6, no. 1, p. 37, 2023.
- “Webank,” https://www.webank.com, 2014.
- A. Fu, X. Zhang, N. Xiong, Y. Gao, H. Wang, and J. Zhang, “Vfl: A verifiable federated learning with privacy-preserving for big data in industrial iot,” IEEE Transactions on Industrial Informatics, vol. 18, no. 5, pp. 3316–3326, 2020.
- P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar, “Split learning for health: Distributed deep learning without sharing raw patient data,” ArXiv, vol. abs/1812.00564, 2018.
- “Bytedance,” https://www.bytedance.com/zh, 2012.
- “Tencent,” https://www.tencent.com/zh-cn/index.html, 1998.
- X. Yang, J. Sun, Y. Yao, J. Xie, and C. Wang, “Differentially private label protection in split learning,” arXiv preprint arXiv:2203.02073, 2022.
- T. Zou, Y. Liu, Y. Kang, W. Liu, Y. He, Z. Yi, Q. Yang, and Y.-Q. Zhang, “Defending batch-level label inference and replacement attacks in vertical federated learning,” IEEE Transactions on Big Data, pp. 1–12, 2022.
- C. Fu, X. Zhang, S. Ji, J. Chen, J. Wu, S. Guo, J. Zhou, A. X. Liu, and T. Wang, “Label inference attacks against vertical federated learning,” in 31st USENIX Security Symposium (USENIX Security 22). Boston, MA: USENIX Association, Aug. 2022. [Online]. Available: https://www.usenix.org/conference/usenixsecurity22/presentation/fu-chong
- D. Pasquini, G. Ateniese, and M. Bernaschi, “Unleashing the tiger: Inference attacks on split learning,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 2113–2129. [Online]. Available: https://doi.org/10.1145/3460120.3485259
- O. Li, J. Sun, X. Yang, W. Gao, H. Zhang, J. Xie, V. Smith, and C. Wang, “Label leakage and protection in two-party split learning,” ArXiv, vol. abs/2102.08504, 2021.
- H. Gu, J. Luo, Y. Kang, L. Fan, and Q. Yang, “Fedpass: privacy-preserving vertical federated deep learning with adaptive obfuscation,” arXiv preprint arXiv:2301.12623, 2023.
- K. Fan, J. Hong, W. Li, X. Zhao, H. Li, and Y. Yang, “Flsg: A novel defense strategy against inference attacks in vertical federated learning,” IEEE Internet of Things Journal, 2023.
- J. Sun, X. Yang, Y. Yao, and C. Wang, “Label leakage and protection from forward embedding in vertical federated learning,” 2022. [Online]. Available: https://arxiv.org/abs/2203.01451
- “Epsilon dataset,” [Online]. Available: https://catboost.ai/en/docs/concepts/python-reference_datasets_epsilon, 2008.
- S. Moro, P. Rita, and P. Cortez, “Bank Marketing,” UCI Machine Learning Repository, 2012, DOI: https://doi.org/10.24432/C5K306.
- J. Blackard, “Covertype,” UCI Machine Learning Repository, 1998, DOI: https://doi.org/10.24432/C50K5N.
- “Faulttype dataset,” [Online]. Available: https://www.kaggle.com/datasets/guanlintao/classification-of-faults-dataset, 2023.
- Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Concept and applications,” arXiv: Artificial Intelligence, 2019.
- D. Pasquini, G. Ateniese, and M. Bernaschi, “Unleashing the tiger: Inference attacks on split learning,” in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, 2021, pp. 2113–2129.
- O. Li, J. Sun, X. Yang, W. Gao, H. Zhang, J. Xie, V. Smith, and C. Wang, “Label leakage and protection in two-party split learning,” 2021. [Online]. Available: https://arxiv.org/abs/2102.08504
- L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
- D. Gao, S. Wan, L. Fan, X. Yao, and Q. Yang, “Complementary knowledge distillation for robust and privacy-preserving model serving in vertical federated learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 18, 2024, pp. 19 832–19 839.
- Y. Wang, Q. Lv, H. Zhang, M. Zhao, Y. Sun, L. Ran, and T. Li, “Beyond model splitting: Preventing label inference attacks in vertical federated learning with dispersed training,” World Wide Web, vol. 26, no. 5, pp. 2691–2707, 2023.
- B. Ghazi, N. Golowich, R. Kumar, P. Manurangsi, and C. Zhang, “Deep learning with label differential privacy,” Advances in neural information processing systems, vol. 34, pp. 27 131–27 145, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.