Scaling laws for learned membership inference

Determine the scaling laws that govern the performance of learned membership inference attacks against fine-tuned autoregressive language models, quantifying how attack effectiveness varies with training diversity, classifier capacity, and feature complexity.

Background

The paper introduces LT-MIA, a learned, transferable membership inference attack that reframes the task as sequence classification over per-token features and trains on diverse fine-tuned transformer models. Training diversity, rather than sheer data volume, is shown to be critical for generalization, and a small transformer classifier already achieves strong performance across unseen architectures.

Despite these results, the authors emphasize that many scaling dimensions (more model–dataset combinations, larger classifiers, richer features) have not been systematically explored, and the overarching scaling laws governing learned membership inference remain unknown.

References

This paper demonstrates proof-of-concept, and the performance ceiling is likely higher; finding the scaling laws that govern learned membership inference remains an open question.

Learning the Signature of Memorization in Autoregressive Language Models  (2604.03199 - Ilić et al., 3 Apr 2026) in Discussion, Scaling subsection