Don't Generate Me: Training Differentially Private Generative Models with Sinkhorn Divergence

Published 1 Nov 2021 in cs.LG and cs.CR | (2111.01177v2)

Abstract: Although machine learning models trained on massive data have led to break-throughs in several areas, their deployment in privacy-sensitive domains remains limited due to restricted access to data. Generative models trained with privacy constraints on private data can sidestep this challenge, providing indirect access to private data instead. We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy. DP-Sinkhorn minimizes the Sinkhorn divergence, a computationally efficient approximation to the exact optimal transport distance, between the model and data in a differentially private manner and uses a novel technique for control-ling the bias-variance trade-off of gradient estimates. Unlike existing approaches for training differentially private generative models, which are mostly based on generative adversarial networks, we do not rely on adversarial objectives, which are notoriously difficult to optimize, especially in the presence of noise imposed by privacy constraints. Hence, DP-Sinkhorn is easy to train and deploy. Experimentally, we improve upon the state-of-the-art on multiple image modeling benchmarks and show differentially private synthesis of informative RGB images. Project page:https://nv-tlabs.github.io/DP-Sinkhorn.

Abstract PDF Upgrade to Chat

Citations (63)

View on Semantic Scholar

Summary

The paper introduces DP-Sinkhorn as a stable framework for training generative models under differential privacy constraints using optimal transport via Sinkhorn divergence.
It outlines a semi-debiased Sinkhorn loss that effectively balances bias and variance in gradient estimation, enhancing convergence and stability.
Empirical results show improved FID scores and higher downstream accuracy on MNIST and Fashion-MNIST compared to state-of-the-art methods.

Overview of DP-Sinkhorn: A Differentially Private Generative Model

This paper presents DP-Sinkhorn, which is a novel approach to training differentially private generative models using Sinkhorn divergence as an optimal transport metric. DP-Sinkhorn addresses the challenges of training generative models on private data by avoiding the use of generative adversarial networks (GANs), which are often limited by training instability due to adversarial objectives. Instead, DP-Sinkhorn leverages the robust properties of optimal transport, specifically the Sinkhorn divergence, to provide a stable training process and achieve privacy preservation simultaneously.

Key Contributions

The paper makes several significant contributions in the field of privacy-preserving generative modeling:

Introduction of DP-Sinkhorn: The authors propose DP-Sinkhorn as a flexible and robust optimal transport-based framework specifically designed for training generative models with differential privacy constraints. DP-Sinkhorn sidesteps adversarial training difficulties by relying on primal optimal transport methods.
Semi-Debiased Sinkhorn Loss: The authors present a novel technique for optimizing the bias-variance trade-off in gradient estimation using a semi-debiased Sinkhorn loss. This method enhances convergence properties by interpolating between biased and unbiased loss computations.
State-of-the-Art Performance: DP-Sinkhorn achieves superior performance on various benchmarks, improving upon the state-of-the-art in image modeling tasks. It demonstrates the ability to generate high-quality and informative synthetic images under differential privacy constraints without the need for public data.

Numerical Results

DP-Sinkhorn has demonstrated strong empirical results. It achieves lower Frechet Inception Distance (FID) and higher accuracy in downstream image classification tasks compared to prior methods such as GS-WGAN, DP-MERF, and G-PATE on datasets like MNIST and Fashion-MNIST under the privacy constraint of $(\epsilon, \delta) = (10, 10^{-5})$ -DP.

Methodological Approach

Non-Adversarial Training: Unlike GANs, DP-Sinkhorn employs optimal transport in its primal form, which simplifies training and improves stability by avoiding adversarial objectives. The Sinkhorn divergence provides a direct and computationally efficient way to measure distribution similarity.
Gradient Sanitization: Privacy protection is enforced by sanitizing gradients through gradient clipping and additive Gaussian noise, adhering to differential privacy standards. The paper employs a sophisticated Renyi Differential Privacy (RDP) mechanism for privacy accounting and provides rigorous analysis to ensure privacy compliance.
Implementation and Design: DP-Sinkhorn is implemented using straightforward gradient-based optimization techniques. It leverages a novel loss formulation to control the bias-variance trade-off effectively during training.

Implications and Future Directions

The successful implementation and evaluation of DP-Sinkhorn highlight its potential applications in privacy-sensitive domains, especially with high-dimensional data like images. The approach opens new avenues for research and development in scalable and robust differentially private generative models. Future work may focus on extending DP-Sinkhorn to other data modalities, improving generator architectures to enhance image quality further, and exploring more sophisticated cost functions to boost performance on complex datasets.

Overall, DP-Sinkhorn represents a significant step forward in the development of generative models that can operate under strict privacy constraints. It maintains competitive performance and scalability, making it an appealing choice for privacy-preserving data sharing and synthetic data generation in various real-world applications.