Membership Inference Attacks From First Principles

Published 7 Dec 2021 in cs.CR and cs.LG | (2112.03570v2)

Abstract: A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset. These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set. We argue that attacks should instead be evaluated by computing their true-positive rate at low (e.g., <0.1%) false-positive rates, and find most prior attacks perform poorly when evaluated in this way. To address this we develop a Likelihood Ratio Attack (LiRA) that carefully combines multiple ideas from the literature. Our attack is 10x more powerful at low false-positive rates, and also strictly dominates prior attacks on existing metrics.

Abstract PDF Upgrade to Chat

Citations (537)

View on Semantic Scholar

Summary

The paper introduces LiRA, a likelihood ratio attack that boosts true-positive rates by a factor of 10 at low false-positive rates.
It leverages Gaussian modeling and multiple shadow models to distinguish training data from non-members with high precision.
The study provides practical insights for privacy auditing and model robustness testing, despite increased computational demands.

Detailed Summary of "Membership Inference Attacks From First Principles" (2112.03570)

Introduction to the Problem

The paper "Membership Inference Attacks From First Principles" (2112.03570) explores the field of membership inference attacks (MIAs), which enable adversaries to determine whether specific data points were part of a model's training dataset. These attacks present significant privacy risks, especially when models are trained on sensitive information. Traditional evaluations use average-case metrics like accuracy; however, this paper argues for a more robust focus on true-positive rates (TPRs) at very low false-positive rates (FPRs), asserting that it's crucial to evaluate the attack's capability to confidently identify members, even in worst-case scenarios.

Contributions and Methodology

The authors introduce a novel Likelihood Ratio Attack (LiRA) that systematically integrates principles from previous research, enhancing efficacy by tenfold at low FPRs, compared to prior art. The attack hinges on several critical ideas:

Shift in Evaluation Metric: Proposes evaluating MIAs through their TPR at low FPRs rather than average-case metrics, addressing what matters most in privacy breaches.
Likelihood Ratio Test: Constructs a likelihood test using Gaussian-distributed membership likelihoods, offering a robust adversarial prediction system.
Extensive Use of Shadow Models: Utilizes shadow models to simulate distributions of trained and non-trained examples, allowing precise calculation of likelihood for any given target instance.
Figure 1: Comparing the true-positive rate vs. false-positive rate of prior membership inference attacks reveals a significant discrepancy in performance at low FPRs, showcasing a 10× improvement for non-overfit CIFAR-10 models using LiRA.

Implementation Insights

To implement the LiRA, several steps are followed:

Training Shadow Models: Train multiple shadow models on the dataset to estimate the statistical distribution of losses for training vs. non-training data points. These serve as a reference distribution.
Gaussian Modeling: The shadows' predictions are modeled as Gaussian distributions, simplifying the formulation and execution of hypothesis testing on whether an example was in the training set.
Scalable Evaluation: The attack requires training computationally substantial shadow models, thus scalability can be challenging but offers precise control over membership inference accuracy.

Python or other ML frameworks can structure this attack using libraries that handle Gaussian fitting and sophisticated model training pipelines.

Practical Applications

Privacy Auditing: LiRA can precisely audit models for privacy risks, guiding enhancements to privacy-preserving methods.
Testing Robustness: Developers can use MIAs to validate the resilience of models against privacy threats, identifying potential vulnerabilities in different configurations.

Performance & Trade-offs

LiRA achieves remarkable success in low FPR scenarios, but at the expense of computational overhead due to the need for extensive shadow model training.

Computational Resources: This attack predominantly requires robust computational resources due to the training of multiple shadow models.
Trade-offs in Accuracy: While the attack maximizes TPR at low FPRs, it may still be necessary to balance computational load versus the evaluated precision of privacy threats.

Conclusion and Future Directions

LiRA emphasizes the need for revisiting previous MIA evaluations, supporting a shift towards low-FPR evaluations that genuinely reflect privacy vulnerabilities. Future work could streamline computational requirements or apply the underlying principles to derivative privacy-based attacks, potentially offering broader applications across privacy-oriented AI applications and systems.