Optimization dynamics and implicit regularization of SGD for GNNs under graph-induced dependencies

Establish theoretical guarantees for the optimization error Δ_n (the expected excess empirical risk over the minimum training loss) when training least-squares graph neural networks from the class F(S_A, L1, L2, p, s, F)—comprising linear graph propagation with operator S_A followed by a sparse deep ReLU readout—using stochastic gradient descent under graph-induced dependencies among nodal losses; specifically, determine how the graph structure influences the optimization landscape and whether SGD induces an implicit regularization that controls the effective capacity of F(S_A, L1, L2, p, s, F), thereby ensuring that Δ_n remains small.

Background

The paper establishes oracle inequalities and convergence rates for least-squares estimation over a class of GNNs that combine linear graph propagation with a deep ReLU readout, explicitly decomposing error into approximation, stochastic, and optimization components. While the statistical and approximation terms are analyzed, the optimization error Δ_n—defined as the expected gap between the training loss of a given estimator and the global minimum within the class—remains uncharacterized.

In practice, GNNs are trained with stochastic gradient descent rather than exact empirical risk minimization. The authors point out that, despite progress on implicit regularization of SGD in i.i.d. settings, it is unknown how graph-induced dependencies affect optimization dynamics, implicit bias, and ultimately the magnitude of Δ_n in this semi-supervised, graph-dependent regime.

References

First, while our results provide explicit bounds on the statistical risk of the least-squares estimator, the optimization error Δ_n{}, which is governed by the training dynamics, remains to be analyzed. In practice, GNNs are trained via stochastic gradient descent (SGD) rather than global risk minimization. While some progress has been made in characterizing the implicit regularization of SGD in standard i.i.d. regression and classification settings, understanding these dynamics in the presence of graph-induced dependencies remains an open challenge.

Semi-Supervised Learning on Graphs using Graph Neural Networks  (2602.17115 - Chen et al., 19 Feb 2026) in Conclusion (Section 6)