Capturing complex conditional noise in distillation-based SSL without auxiliary conditioning
Ascertain how distillation-based self-supervised learning methods, such as BYOL, can capture complex noise structures in the conditional distribution p(x^+ | x), including heteroscedasticity and multimodality, without conditioning on additional information.
References
Intuitively, the predictor accounts for cases where E[+ \mid ] \neq ; but it remains unclear how it can capture complex noise structures in p(+ \mid )—which may be heteroscedastic or even multimodal—without conditioning on additional information.
— Self-Supervised Learning from Structural Invariance
(2602.02381 - Zhang et al., 2 Feb 2026) in Section 2.3 (Preliminaries: distillation-based SSL)