Implicit bias toward weighted nuclear-norm minimal solutions in randomly weighted two-layer factorizations
Establish whether gradient descent with randomly weighted data for two-layer positive semi-definite matrix factorization, minimizing the loss L(W1, W2) = ||Y D − W2 W1 X D||_F^2 where D is a random diagonal weighting with expected square M2 = E[D^2], is implicitly biased toward a weighted nuclear-norm minimal solution for the product W2 W1, analogous to the linear case in which randomly weighted gradient descent targets the weighted linear least squares estimator X^+ M2^{1/2} Y.
References
Hence, one may conjecture that a weighted nuclear norm minimal estimator plays the same role as the weighted linear least squares estimator $$ does in our analysis.
— The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights
(2512.10188 - Clara et al., 11 Dec 2025) in Section 5 (Discussion and Outlook)