Modeling arbitrary conditional distributions in contrastive SSL with simple similarity functions

Determine how to flexibly model an arbitrary conditional distribution p(x^+ | x) for positive pairs in contrastive self-supervised learning while maintaining a simple similarity function (for example, a dot product on normalized embeddings) that supports efficient feature extraction.

Background

In the discussion of contrastive self-supervised learning, the paper explains that common similarity functions (e.g., dot products on normalized embeddings) implicitly correspond to restrictive conditional distributions (such as von Mises–Fisher) and do not capture anisotropic or more complex noise without additional machinery.

Despite extensions that introduce anisotropy, the authors explicitly state uncertainty about how to model general (potentially complex) conditionals while keeping the similarity function simple enough to still enable efficient feature extraction. This motivates their latent-variable approach but leaves the general modeling question open as a broader problem.

References

Nevertheless, it remains unclear how to flexibly model an arbitrary conditional distribution p(+ \mid ) while keeping the similarity function simple enough to allow efficient feature extraction.

Self-Supervised Learning from Structural Invariance  (2602.02381 - Zhang et al., 2 Feb 2026) in Section 2.2 (Preliminaries: contrastive SSL)