Papers
Topics
Authors
Recent
Search
2000 character limit reached

Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere

Published 27 Mar 2019 in cs.NE, cs.LG, and nlin.CD | (1903.11691v2)

Abstract: Among the various architectures of Recurrent Neural Networks, Echo State Networks (ESNs) emerged due to their simplified and inexpensive training procedure. These networks are known to be sensitive to the setting of hyper-parameters, which critically affect their behaviour. Results show that their performance is usually maximized in a narrow region of hyper-parameter space called edge of chaos. Finding such a region requires searching in hyper-parameter space in a sensible way: hyper-parameter configurations marginally outside such a region might yield networks exhibiting fully developed chaos, hence producing unreliable computations. The performance gain due to optimizing hyper-parameters can be studied by considering the memory--nonlinearity trade-off, i.e., the fact that increasing the nonlinear behavior of the network degrades its ability to remember past inputs, and vice-versa. In this paper, we propose a model of ESNs that eliminates critical dependence on hyper-parameters, resulting in networks that provably cannot enter a chaotic regime and, at the same time, denotes nonlinear behaviour in phase space characterised by a large memory of past inputs, comparable to the one of linear networks. Our contribution is supported by experiments corroborating our theoretical findings, showing that the proposed model displays dynamics that are rich-enough to approximate many common nonlinear systems used for benchmarking.

Citations (23)

Summary

  • The paper introduces a novel ESN model that employs a self-normalizing activation function projecting neuron states onto a hyper-sphere, reducing chaotic dynamics.
  • The model maintains universal approximation properties and robust memory capacity, excelling in nonlinear systems like Lorenz and Mackey–Glass time-series prediction.
  • By operating stably at the edge of criticality, the approach minimizes hyper-parameter tuning, paving the way for efficient and adaptable recurrent network designs.

Echo State Networks with Self-Normalizing Activations on the Hyper-Sphere

Introduction

Echo State Networks (ESNs) have gained attention due to their training simplicity and efficiency. Traditional ESNs, however, are highly sensitive to hyper-parameter settings, often requiring them to operate near the edge of criticality (EoC) to avoid chaotic behaviors and maximize performance. This paper introduces a novel ESN model employing self-normalizing activation functions, which maintains stability across a wide range of hyper-parameter values, thereby eliminating the risk of chaotic states while preserving the network's ability to approximate nonlinear systems.

Proposed Model and Activation Function

The proposed ESN model is characterized by a self-normalizing activation function that projects each neuron’s activation onto a hyper-sphere, ensuring the system's stability and avoiding chaotic behavior. This approach maintains rich dynamics and a memory capacity comparable to linear networks. Figure 1

Figure 1: Schematic representation of an Echo State Network (ESN) with a hyper-spherical activation structure.

The key elements of this model include modifying the activation function to globally affect neuron states simultaneously, calculated as:

xk=rak∥ak∥{x}_{k} = r \frac{a_k}{\lVert a_k \rVert}

This transformation ensures that the activations are normalized over the hyper-sphere of radius rr, providing robustness irrespective of the input type or hyper-parameter settings.

Stability Analysis and Universal Approximation

The paper demonstrates that the proposed ESN model is a universal function approximator, retaining the Echo State Property (ESP) under broad conditions. The derived Jacobian matrix for the activation reveals that the system cannot exhibit positive Lyapunov exponents, which precludes chaotic dynamics:

Jij=Wij∥a∥∥a∥2(1−aiaj∥a∥2)J_{ij} = \frac{W_{ij} \lVert a \rVert}{\lVert a \rVert^2} \left( 1 - \frac{a_i a_j}{\lVert a \rVert^2} \right) Figure 2

Figure 2: Panel (a) shows local Lyapunov exponents (LLE) for various spectral radius values, emphasizing the non-chaotic behavior.

This characteristic allows the network to always operate at the EoC, hence maximizing performance over typical configurations.

Memory Capacity and Nonlinearity

The network's memory capacity was rigorously tested against several benchmarks, showcasing superior performance in memory-intensive tasks while effectively managing nonlinearity: Figure 3

Figure 3: Results of memory tasks on different benchmarks illustrating the model's proficiency in retaining past inputs.

The spherical activation function helps achieve a balance between memory retention and nonlinear operations, outperforming both linear and hyperbolic tangent-based networks in tasks demanding both attributes.

Experimental Results

The proposed model's experimental validation involved various time-series prediction tasks, such as the Lorenz and Mackey-Glass systems. Comparisons with traditional ESNs and those employing linear activation functions revealed that spherical ESNs consistently deliver robust performance across a spectrum of scenarios. Figure 4

Figure 4: Performance of linear networks indicating high errors with increased nonlinearity.

Figure 5

Figure 5: Performance of the proposed model demonstrating robustness across memory and nonlinearity dimensions.

Conclusion

The self-normalizing activation function over the hyper-sphere introduces a paradigm shift in ESN design by offering a stable yet dynamic framework for recurrent networks. This approach significantly alleviates hyper-parameter sensitivity, sustaining EoC-based optimization without risking chaotic interference and ensuring broad applicability across nonlinear computational tasks.

Ultimately, this work provides critical insights for future developments in ESN architectures, paving the way for more efficient and stable recurrent network designs that can be easily adapted to diverse machine learning contexts without extensive hyper-parameter tuning.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.