Dynamical stability and chaos in artificial neural network trajectories along training

Published 8 Apr 2024 in cs.LG, cond-mat.dis-nn, nlin.CD, and physics.data-an | (2404.05782v1)

Abstract: The process of training an artificial neural network involves iteratively adapting its parameters so as to minimize the error of the network's prediction, when confronted with a learning task. This iterative change can be naturally interpreted as a trajectory in network space -- a time series of networks -- and thus the training algorithm (e.g. gradient descent optimization of a suitable loss function) can be interpreted as a dynamical system in graph space. In order to illustrate this interpretation, here we study the dynamical properties of this process by analyzing through this lens the network trajectories of a shallow neural network, and its evolution through learning a simple classification task. We systematically consider different ranges of the learning rate and explore both the dynamical and orbital stability of the resulting network trajectories, finding hints of regular and chaotic behavior depending on the learning rate regime. Our findings are put in contrast to common wisdom on convergence properties of neural networks and dynamical systems theory. This work also contributes to the cross-fertilization of ideas between dynamical systems theory, network theory and machine learning

Abstract PDF HTML Upgrade to Chat

References (57)

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates that ANN training dynamics can alternate between convergence and divergence, even under low learning rate conditions.
It reveals that higher learning rates induce an 'edge of stability' behavior marked by fast but non-monotonic loss convergence and sensitive dependence on initial conditions.
Experiments with very high learning rates show quasi-periodic and chaotic dynamics, highlighting the need to optimize learning rate settings for robust performance.

Dynamical Stability and Chaos in ANN Trajectories During Training

This essay discusses the theoretical exploration of artificial neural network (ANN) training dynamics through the lens of dynamical systems theory. The paper attempts to deepen the understanding of complex behaviors exhibited during the ANN training process by analysing the evolution of network trajectories under different learning rate regimes. Here, a shallow neural network's behavior is scrutinized as it attempts to learn a simple classification task.

Introduction to ANN Dynamics

In ANN training, parameters are iteratively updated to minimize prediction error, construed as trajectories in graph space. By using dynamical systems theory, the research examines ANN weight changes as a time series, interpreting gradient descent optimization as a discrete dynamical system. Broadly, the paper investigates how these network trajectories behave and evolve, particularly focusing on dynamical and orbital stability concepts, which are crucial for understanding convergence properties in machine learning.

Figure 1: The training process of an ANN is depicted as a network trajectory in graph space, where in each iteration of the optimization scheme the network parameters are updated, leading to a decreasing loss function.

Preliminaries

This section lays the groundwork for interpreting ANN training as a dynamical process. A one-hidden-layer, feed-forward neural network is trained with the Iris dataset, where the input comprises physiological properties of iris flowers and the task is species classification. The simple architecture with a single hidden layer and cross-entropy loss is used to elucidate the fundamental dynamics of network training. The gradient descent algorithm is employed with both small and large constant learning rates to observe convergence behavior.

Figure 2: Illustration of the Iris dataset and difficulty in linearly separating the three classes. Datapoints are shown in the space of two of their four input features, namely "sepal length" and "sepal width". Colors correspond to different classes, while markers show whether the instances were classified correctly or not (marked as ‘x’ if the prediction was incorrect).

Low Learning Rate Regime

At a low learning rate (e.g., $\eta=0.01$ ), gradient descent is expected to converge monotonically. However, this research reveals contrasting behavior: network trajectories do not converge smoothly to fixed points; instead, they alternate between convergence and divergence. Results imply that this could be due to flat loss landscapes or marginal stability where trajectories drift in parameter space without significant improvement in loss.

Figure 3: Example showing the evolution of the distances between reference and perturbed trajectories, for a perturbation radius $\epsilon=10^{-8}$ .

High Learning Rate Regime

Edge of Stability ( $\eta=1$ )

Increasing the learning rate to $\eta=1$ leads to a regime termed the 'edge of stability'. Here, the ANN does not diverge but rather converges faster despite a non-monotonic loss trajectory. A clear signature of sensitive dependence on initial conditions is observed, evidenced through the estimation of network Lyapunov exponents.

Figure 4: Lyapunov exponents for trajectories in the Edge of Stability (eta=1) regime. (Left panel) Distribution of finite Lyapunov exponents P(\Lambda), where each Lambda is estimated from Eq. ${\ref{eq:Lambda}}$

Very Large Learning Rate ( $\eta=5$ )

In this regime, the dynamics become significantly unpredictable. The complex behavior noted includes quasi-periodic and intermittent-like chaotic dynamics. These observations suggest a transition to chaotic behavior, with fascinating implications for understanding the dynamics of very high learning rate scenarios.

Figure 5: Training trajectories in the very large eta=5 regime. Each column a)-d) represents the trajectory starting from an independent initial condition.

Discussion

The explorations presented demonstrate that ANN training dynamics can exhibit a complex array of behaviors, challenging simple convergence narratives. The phenomena observed at different learning rates underscore the theory's capacity to unravel underlying chaotic or complex dynamics, reflected in both marginal stability zones and the 'edge of stability'.

The findings reiterate the need to consider such dynamical behaviors when configuring learning rates and understanding potential ANN learning pathways, potentially optimizing search algorithms for machine learning applications.

Conclusion

The studied trajectories indicate that classical dynamical systems tools are powerful for interpreting ANN training processes, revealing consistent patterns across simple and complex learning tasks. Future work should aim at further intersecting the fields of machine learning, dynamical systems, and temporal networks to offer deeper insights and more effective learning strategies.