Effectiveness of LLM-based deanonymization when true matches are extremely rare

Establish whether large-language-model-based deanonymization retains non-trivial recall at high precision even when the prior probability that a query has any matching candidate in the pool is extremely small (e.g., π = 0.0001), demonstrating that, for the few identifiable users, the pipeline can still reliably find correct matches under this low-matchability regime.

Background

The authors analyze attack difficulty by varying the fraction of matchable queries (π) and report that, at 90% precision, their LLM-based methods achieve at least 9% recall even when π is as low as 0.0001. This suggests robustness when most queries have no true match, but the result is based on empirical estimation and post-hoc calculation rather than a formal guarantee.

They therefore frame a conjecture that, even in settings where almost no users are deanonymizable, LLM-based attacks will still find correct matches among those few who are identifiable. A formal result would clarify real-world risk in environments with sparse matchability.

References

We hence conjecture that, even in settings where almost no users can be deanonymized, LLM-based attacks are reasonably likely to find a correct match for the few users that are identifiable.

— Large-scale online deanonymization with LLMs (2602.16800 - Lermen et al., 18 Feb 2026) in Section 6.2 (Comparing difficulty parameters of our attack model)

Effectiveness of LLM-based deanonymization when true matches are extremely rare

Background

References

Related Problems