Implicit bias toward weighted nuclear-norm minimal solutions in randomly weighted two-layer factorizations

Establish whether gradient descent with randomly weighted data for two-layer positive semi-definite matrix factorization, minimizing the loss L(W1, W2) = ||Y D − W2 W1 X D||_F^2 where D is a random diagonal weighting with expected square M2 = E[D^2], is implicitly biased toward a weighted nuclear-norm minimal solution for the product W2 W1, analogous to the linear case in which randomly weighted gradient descent targets the weighted linear least squares estimator X^+ M2^{1/2} Y.

Background

The paper analyzes gradient descent with random data weightings in linear regression and shows that the dynamics target the weighted linear least squares estimator defined by the expected squared weighting M2 = E[D2]. It characterizes convergence properties and the stationary distribution induced by such algorithmic noise.

As an intermediate step toward more complex models, prior work has shown that full-batch gradient descent on two-layer positive semi-definite factorizations exhibits implicit bias toward nuclear-norm minimal solutions. Extending this to the randomly weighted setting, the authors discuss the two-layer loss L(W1, W2) = ||Y D − W2 W1 X D||_F2 and provide the corresponding gradient updates, but note added complications for moment dynamics. They explicitly conjecture that an analogous weighted nuclear-norm minimal estimator should play the same role in this setting as the weighted linear least squares estimator does in their linear analysis.

References

Hence, one may conjecture that a weighted nuclear norm minimal estimator plays the same role as the weighted linear least squares estimator $$ does in our analysis.

The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights  (2512.10188 - Clara et al., 11 Dec 2025) in Section 5 (Discussion and Outlook)