Memory of recurrent networks: Do we compute it right?

Published 2 May 2023 in cs.LG and stat.ML | (2305.01457v2)

Abstract: Numerical evaluations of the memory capacity (MC) of recurrent neural networks reported in the literature often contradict well-established theoretical bounds. In this paper, we study the case of linear echo state networks, for which the total memory capacity has been proven to be equal to the rank of the corresponding Kalman controllability matrix. We shed light on various reasons for the inaccurate numerical estimations of the memory, and we show that these issues, often overlooked in the recent literature, are of an exclusively numerical nature. More explicitly, we prove that when the Krylov structure of the linear MC is ignored, a gap between the theoretical MC and its empirical counterpart is introduced. As a solution, we develop robust numerical approaches by exploiting a result of MC neutrality with respect to the input mask matrix. Simulations show that the memory curves that are recovered using the proposed methods fully agree with the theory.

Abstract PDF Upgrade to Chat

Citations (3)

View on Semantic Scholar

Summary

The paper identifies numerical artefacts affecting the memory capacity computation in echo state networks.
It demonstrates that the ill-conditioning of Krylov matrices skews empirical measurements, prompting the need for robust algorithmic adjustments.
The study reveals that memory capacity is neutral to input mask variations, thereby aligning empirical evaluations with theoretical predictions.

Memory Capacity in Recurrent Neural Networks

Introduction

The paper "Memory of recurrent networks: Do we compute it right?" (2305.01457) addresses the discrepancies observed in the numerical evaluation of the memory capacity (MC) of recurrent neural networks (RNNs), particularly focusing on linear echo state networks (ESNs). Despite theoretical frameworks suggesting the total memory capacity of an RNN should align with the rank of its Kalman controllability matrix, empirical measures frequently contradict this. This paper identifies these inconsistencies as predominantly numerical rather than theoretical.

Numerical and Theoretical Discrepancies

Numerical evaluations for memory capacity frequently diverge from theoretical expectations due to issues largely overlooked in existing literature. The authors assert that a disregard for the inherent Krylov structure in linear MC leads to disparities between theoretical and empirical values. They develop robust numerical methods by leveraging a result demonstrating MC neutrality concerning the input mask matrix, thereby aligning empirical simulations more closely with theoretical predictions.

Methodology

The research highlights two primary contributions:

Identifying Numerical Artefacts: The authors identify and rigorously analyze numerical artefacts that affect the evaluation of memory capacity in ESNs. These artefacts arise from improper handling of Krylov structures, which are essential to understanding how memory is computed in linear networks.
Proposing Robust Algorithms: By exploiting input mask neutrality, they propose robust algorithms that produce memory curves consistent with theoretical underpinnings, thus providing methods to correct previously flawed memory capacity estimations.

Key Findings

Memory Capacity Neutrality: One of the groundbreaking insights from this paper is the demonstration of memory capacity neutrality to the input mask. This means that the variability in empirical results due to changes in the input mask can be neutralized, leading to more consistent memory capacity evaluations.
Ill-conditioning Issues: The paper explores issues concerning the ill-conditioning of Krylov matrices, which plays a crucial role in the gap between empirical approximations and theoretical values. By addressing the conditioning of these matrices, the authors provide a path to more accurate memory capacity assessment.

Implications and Future Work

The implications of these findings are twofold:

Theoretical Clarity: The insights provided clarify a long-standing misunderstanding in RNN literature regarding the memory capacity of linear networks.
Practical Implementation: For practitioners in machine learning and neural network deployment, the methods proposed offer a more robust framework for assessing network capacities, potentially impacting how RNNs are trained and optimized.

The research opens avenues for future work in exploring memory capacities under different network configurations and input conditions, extending beyond the linear systems discussed. Furthermore, it prompts a reevaluation of previous work in reservoir computing and ESNs, potentially leading to advancements in the design of neural networks with optimized memory capacities.

Conclusion

This paper significantly contributes to the computational and theoretical understanding of memory in recurrent neural networks. By addressing the computational artefacts in memory evaluation through robust techniques, it not only rectifies the discrepancies observed across empirical studies but also strengthens the confidence in using theoretical predictions for designing and understanding RNNs. The work represents a methodological advancement with lasting implications for both research and applied domains in neural computations.