Optimization dynamics and implicit regularization of SGD for GNNs under graph-induced dependencies
Establish theoretical guarantees for the optimization error Δ_n (the expected excess empirical risk over the minimum training loss) when training least-squares graph neural networks from the class F(S_A, L1, L2, p, s, F)—comprising linear graph propagation with operator S_A followed by a sparse deep ReLU readout—using stochastic gradient descent under graph-induced dependencies among nodal losses; specifically, determine how the graph structure influences the optimization landscape and whether SGD induces an implicit regularization that controls the effective capacity of F(S_A, L1, L2, p, s, F), thereby ensuring that Δ_n remains small.
References
First, while our results provide explicit bounds on the statistical risk of the least-squares estimator, the optimization error Δ_n{}, which is governed by the training dynamics, remains to be analyzed. In practice, GNNs are trained via stochastic gradient descent (SGD) rather than global risk minimization. While some progress has been made in characterizing the implicit regularization of SGD in standard i.i.d. regression and classification settings, understanding these dynamics in the presence of graph-induced dependencies remains an open challenge.