Papers
Topics
Authors
Recent
Search
2000 character limit reached

Federated Knowledge Recycling: Privacy-Preserving Synthetic Data Sharing

Published 30 Jul 2024 in cs.LG, cs.AI, and cs.CV | (2407.20830v1)

Abstract: Federated learning has emerged as a paradigm for collaborative learning, enabling the development of robust models without the need to centralise sensitive data. However, conventional federated learning techniques have privacy and security vulnerabilities due to the exposure of models, parameters or updates, which can be exploited as an attack surface. This paper presents Federated Knowledge Recycling (FedKR), a cross-silo federated learning approach that uses locally generated synthetic data to facilitate collaboration between institutions. FedKR combines advanced data generation techniques with a dynamic aggregation process to provide greater security against privacy attacks than existing methods, significantly reducing the attack surface. Experimental results on generic and medical datasets show that FedKR achieves competitive performance, with an average improvement in accuracy of 4.24% compared to training models from local data, demonstrating particular effectiveness in data scarcity scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pages 1273–1282. PMLR, 2017.
  2. Federated optimization in heterogeneous networks. Proceedings of Machine learning and systems, 2:429–450, 2020.
  3. Scaffold: Stochastic controlled averaging for federated learning. In International conference on machine learning, pages 5132–5143. PMLR, 2020.
  4. Tackling the objective inconsistency problem in heterogeneous federated optimization. Advances in Neural Information Processing Systems, 2020.
  5. Fedgan: Federated generative adversarial networks for distributed data. arXiv preprint, 2020.
  6. Generative models for effective ml on private, decentralized datasets. In International Conference on Learning Representations, 2019.
  7. Federated semi-supervised learning with inter-client consistency & disjoint learning. In International Conference on Learning Representations, 2021.
  8. Federated learning on non-iid data silos: An experimental study. In International Conference on Data Engineering, 2022.
  9. Sgde: Secure generative data exchange for cross-silo federated learning. In Proceedings of the International Conference on Artificial Intelligence and Pattern Recognition, 2022.
  10. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2015.
  11. Membership inference attacks against machine learning models. In Symposium on Security and Privacy, 2017.
  12. Data poisoning attacks against federated learning systems. In European Symposium on Research in Computer Security, 2020.
  13. Byzantine-robust distributed learning: Towards optimal statistical rates. In International Conference on Machine Learning, 2018.
  14. Attack of the tails: Yes, you really can backdoor federated learning. Advances in Neural Information Processing Systems, 2020.
  15. Evasion attacks against machine learning at test time. In Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases-Volume Part III, 2013.
  16. Deep leakage from gradients. Advances in Neural Information Processing Systems, 2019.
  17. Free-rider attacks on model aggregation in federated learning. In International Conference on Artificial Intelligence and Statistics, 2021.
  18. Property inference attacks on fully connected neural networks using permutation invariant representations. In Proceedings of the 25th ACM SIGSAC Conference on Computer and Communications Security, 2018.
  19. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE symposium on security and privacy (SP), pages 739–753. IEEE, 2019.
  20. Deep learning with differential privacy. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, 2016.
  21. Pate-gan: Generating synthetic data with differential privacy guarantees. In International Conference on Learning Representations, 2018.
  22. On the utility and protection of optimization with differential privacy and classic regularization techniques. In International Conference on Machine Learning, Optimization, and Data Science, 2022.
  23. Discriminative adversarial privacy: Balancing accuracy and membership privacy in neural networks. British Machine Vision Conference, 2023.
  24. A survey on homomorphic encryption schemes: Theory and implementation. ACM Computing Surveys, 2018.
  25. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017.
  26. Atomo: Communication-efficient learning via atomic sparsification. Advances in neural information processing systems, 2018.
  27. Classifier training from a generative model. In International Conference on Content-Based Multimedia Indexing, 2019.
  28. Bridging the gap: Enhancing the utility of synthetic data via post-processing techniques. British Machine Vision Conference, 2023.
  29. Stable diffusion dataset generation for downstream classification tasks. arXiv preprint arXiv:2405.02698, 2024.
  30. Synthetic image learning: Preserving performance and preventing membership inference attacks. arXiv preprint arXiv:2407.15526, 2024.
  31. Classification accuracy score for conditional generative models. Advances in Neural Information Processing Systems, 2019.
  32. Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems, 2011.
  33. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research, 2018.
  34. Learning multiple layers of features from tiny images. Technical Report of Toronto University, 2009.
  35. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.
  36. Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification. Scientific Data, 2023.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.