(Token-Level) InfoRMIA: Stronger Membership Inference and Memorization Assessment for LLMs

Published 7 Oct 2025 in cs.LG | (2510.05582v2)

Abstract: Machine learning models are known to leak sensitive information, as they inevitably memorize (parts of) their training data. More alarmingly, LLMs are now trained on nearly all available data, which amplifies the magnitude of information leakage and raises serious privacy risks. Hence, it is more crucial than ever to quantify privacy risk before the release of LLMs. The standard method to quantify privacy is via membership inference attacks, where the state-of-the-art approach is the Robust Membership Inference Attack (RMIA). In this paper, we present InfoRMIA, a principled information-theoretic formulation of membership inference. Our method consistently outperforms RMIA across benchmarks while also offering improved computational efficiency. In the second part of the paper, we identify the limitations of treating sequence-level membership inference as the gold standard for measuring leakage. We propose a new perspective for studying membership and memorization in LLMs: token-level signals and analyses. We show that a simple token-based InfoRMIA can pinpoint which tokens are memorized within generated outputs, thereby localizing leakage from the sequence level down to individual tokens, while achieving stronger sequence-level inference power on LLMs. This new scope rethinks privacy in LLMs and can lead to more targeted mitigation, such as exact unlearning.

Abstract PDF Upgrade to Chat

Summary

The paper introduces an information-theoretic framework for membership inference that refines sequence-level analysis by evaluating token-level leakage.
It employs a Bayes factor-like statistic to compute information gain per token, demonstrating superior performance with high TPR@FPR and robust AUC across benchmarks.
Token-level InfoRMIA facilitates targeted privacy audits and unlearning interventions, offering actionable insights to mitigate memorization risks in LLMs.

(Token-Level) InfoRMIA: An Information-Theoretic Framework for Precise Membership Inference in LLMs

Introduction and Motivation

This work addresses two critical limitations in membership inference attacks (MIAs) for LLMs: the reliance on sequence-level statistics and the dependence of state-of-the-art attacks (e.g., RMIA) on large population datasets. The authors introduce InfoRMIA, a theoretically principled information-theoretic variant of RMIA, and extend it to token-level analysis, enabling more granular assessment of privacy risks and memorization behavior in LLMs. This approach offers statistically stronger inference power, improved efficiency, and actionable insights for privacy-preserving interventions.

InfoRMIA: An Information-Theoretic MIA

The Robust Membership Inference Attack (RMIA) computes the proportion of population data a target query dominates, which yields a discrete score whose expressivity is bottlenecked by the number of reference points and introduces dependence on the population set size. InfoRMIA reframes the membership inference game as a composite hypothesis testing problem and employs a Bayes factor-like statistic that evaluates the information gain in bits— $\log \frac{p(x|\theta)}{p(x)}$ —and accounts for distributional shift via $D_{KL}(p(z) \| p(z|\theta))$ . This continuous statistic enhances granularity, is less sensitive to the choice of the population set $Z$ , and eliminates the need for threshold tuning.

The empirical evaluation in the paper demonstrates clear superiority: InfoRMIA consistently outperforms RMIA in AUC and TPR@FPR across diverse domains (tabular, vision, text) and maintains its efficacy even with drastically smaller population sets. This establishes InfoRMIA as a more scalable and cost-effective reference-based MIA.

Token-Level Membership Inference for LLMs

A critical insight of the paper is that sequence-level membership assessments in LLMs obscure the model's actual leakage patterns, as only a small subset of tokens in a sequence may represent sensitive content. The mapping from sequences to a single membership score acts as a severe (and lossy) compression, conflating common tokens with private ones and failing to illuminate the location or type of memorization.

The correlation between average sequence-level and private-token-level membership scores is weak, indicating deficiencies in sequence-level analysis.

Figure 1: The average membership scores of sequences and their private tokens are not strongly correlated.

The token-level InfoRMIA framework evaluates membership per token prediction step—aligned with the LLM training objective—and provides position-aware scores. It leverages the fact that each next-token prediction is a learning instance, and enables token-based assessment of privacy risk, which is directly actionable for tasks such as targeted unlearning and forensic audits of memorized PII.

The distributional characteristics of token-level InfoRMIA scores on AG News demonstrate that sensitive entities (such as PERSON and WORK_OF_ART tokens) are more prone to memorization.

Figure 2: Histogram of average token scores across top entity groups on AG News; PERSON and WORK_OF_ART have notably higher memorization rates.

The agglomerated statistics confirm that memorization is concentrated in a minority of private entity tokens, and non-private/common tokens generally dilute leakage assessments when averaging over an entire sequence.

On the ai4privacy benchmark, the analysis reveals two phenomena: (1) non-private tokens can exhibit comparable or higher variance in "memorization" than true private ones, and (2) the top sequences by sequence-level score often do not contain high-scoring private tokens.

Figure 3: Non-private tokens show a slightly higher mean and larger variance in membership scores compared to private tokens.

Figure 4: Top-10 memorized sequences by sequence-based scores in ai4privacy; some lack private tokens altogether, others have low average private token scores.

In contrast, sequences with the highest average private token memorization do not rank highly by sequence-based metrics—attesting to the strong dilution of privacy signals in sequence-level evaluation.

Figure 5: Top-10 sequences with highest average private token scores have low overall sequence-level scores, illustrating dilution of memorization signals.

Empirical Performance and Applied Insights

Token-level InfoRMIA, when aggregated (e.g., via averaging), achieves competitive or superior sequence-level MIA performance relative to both sequence-based InfoRMIA and reference-model approaches, according to all major benchmarks (AG News, ai4privacy, MIMIR). Critically, InfoRMIA delivers high TPR at very low FPR, which is the operational regime of greatest concern for privacy audits.

Moreover, token-level analysis exposes the limitations of AUC-based privacy quantification in LLMs: high aggregate AUCs may not reflect true leakage of private information, as memorization is not uniformly distributed and sequence-level scoring misidentifies privacy risk. The authors' HTML visualization interface and heatmapping tool offer actionable inspection for auditors—making it possible to localize leakage, assess the memorization of specific entities (especially PII), and devise more precise and surgical unlearning procedures.

The distribution of token-level InfoRMIA scores further supports focused audit strategies.

Figure 6: Distribution of token InfoRMIA scores on AG News shows near-normal distribution but strong right-tail for certain entities.

Figure 7: Distribution of high scoring tokens by type in ai4privacy (log-scale), emphasizing the statistical sparsity of memorized sensitive tokens.

Implications and Future Directions

This work redefines the granularity at which privacy risk should be assessed and mitigated in LLMs. The information-theoretic framework of InfoRMIA provides a statistically sound and computationally efficient attack paradigm, suitable for both academic evaluation and real-world audits. The token-level perspective greatly enhances interpretability and the scientific understanding of memorization phenomena in deep autoregressive transformers.

Practically, the approach enables targeted interventions—e.g., selective unlearning or content filtering—potentially retaining non-sensitive knowledge while guaranteeing erasure of specific memorized elements. Theoretical implications are substantial: privacy risk assessments should shift toward token-level or span-level statistics, especially for domains (legal, medical, financial) where sensitive entities are local and limited.

Future work can extend in several technical directions:

Further optimization of aggregation strategies for membership signals to match downstream privacy definitions,
Extension to subword or span analysis for contextual memorization,
Integration with automated detection of infrequent/entity tokens,
Exploration of adversarial or certified unlearning guided by token-level memorization maps.

Conclusion

(Token-Level) InfoRMIA advances the state of the art in membership inference attacks for LLMs, both in terms of statistical rigor and practical interpretability (2510.05582). By grounding its methodology in information theory and shifting analytical granularity to the token level, it achieves stronger, more efficient, and operationally relevant privacy leakage assessment. The token-level perspective reveals the limitations of previous sequence-level approaches and equips both researchers and practitioners with direct tools for precise privacy auditing and mitigation in the LLM era.

Markdown Report Issue