Robustness under heavy-tailed or state-dependent gradient noise
Investigate the behavior of the Adam-type continuous-time SDE (eq:cts-x)–(eq:cts-y) and its time-homogeneous limit (eq:cts-x+)–(eq:cts-y+) when the stochastic gradient noise is heavy-tailed or state-dependent rather than isotropic Gaussian; in particular, determine whether existence/uniqueness of invariant measures and exponential convergence persist and identify necessary conditions or modifications.
References
Nevertheless, important open questions remain, including the role of bias correction at finite horizons, convergence rates beyond convex or Polyak-Lojasiewicz regimes, robustness under heavy-tailed or state-dependent gradient noise, the structure of invariant measures induced by coordinatewise preconditioning, and metastability near saddle points in high dimensions.