Robustness under heavy-tailed or state-dependent gradient noise

Investigate the behavior of the Adam-type continuous-time SDE (eq:cts-x)–(eq:cts-y) and its time-homogeneous limit (eq:cts-x+)–(eq:cts-y+) when the stochastic gradient noise is heavy-tailed or state-dependent rather than isotropic Gaussian; in particular, determine whether existence/uniqueness of invariant measures and exponential convergence persist and identify necessary conditions or modifications.

Background

The paper’s SDE model employs isotropic Gaussian noise to obtain tractable analysis and to establish invariant measures and exponential mixing via Harris-type arguments. The authors note that practical training often exhibits heavy-tailed or state-dependent noise.

Extending the theory to these more realistic noise models would test the robustness of the ergodic results and may require new tools to handle non-Gaussian or multiplicative noise effects.

References

Nevertheless, important open questions remain, including the role of bias correction at finite horizons, convergence rates beyond convex or Polyak-Lojasiewicz regimes, robustness under heavy-tailed or state-dependent gradient noise, the structure of invariant measures induced by coordinatewise preconditioning, and metastability near saddle points in high dimensions.

— Fokker-Planck Analysis and Invariant Laws for a Continuous-Time Stochastic Model of Adam-Type Dynamics (2604.00840 - Nyström, 1 Apr 2026) in Section 1, Introduction

Robustness under heavy-tailed or state-dependent gradient noise

Background

References

Related Problems