FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network

Published 8 Jan 2019 in cs.LG, cs.AI, cs.NE, and stat.ML | (1901.02358v1)

Abstract: This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. Previous approaches have improved accuracy at the expense of prediction costs making them infeasible for resource-constrained and real-time applications. Unitary RNNs have increased accuracy somewhat by restricting the range of the state transition matrix's singular values but have also increased the model size as they require a larger number of hidden units to make up for the loss in expressive power. Gated RNNs have obtained state-of-the-art accuracies by adding extra parameters thereby resulting in even larger models. FastRNN addresses these limitations by adding a residual connection that does not constrain the range of the singular values explicitly and has only two extra scalar parameters. FastGRNN then extends the residual connection to a gate by reusing the RNN matrices to match state-of-the-art gated RNN accuracies but with a 2-4x smaller model. Enforcing FastGRNN's matrices to be low-rank, sparse and quantized resulted in accurate models that could be up to 35x smaller than leading gated and unitary RNNs. This allowed FastGRNN to accurately recognize the "Hey Cortana" wakeword with a 1 KB model and to be deployed on severely resource-constrained IoT microcontrollers too tiny to store other RNN models. FastGRNN's code is available at https://github.com/Microsoft/EdgeML/.

Abstract PDF Upgrade to Chat

Citations (185)

View on Semantic Scholar

Summary

The paper proposes FastGRNN, a compact gated recurrent network that uses residual connections to stabilize gradients and enhance accuracy.
It achieves up to 35x reduction in model size and faster training/inference, making it ideal for IoT and memory-limited environments.
Empirical results validate robust performance in tasks like speech recognition and language processing with execution speed improvements of 20-135x.

Analysis of FastGRNN: A Compact and Efficient Gated Recurrent Neural Network

The paper presents the FastGRNN architecture to mitigate the dual challenges of inaccuracy and inefficiency typically associated with Recurrent Neural Networks (RNNs), particularly unitary and gated RNNs, in resource-constrained environments. The FastGRNN architecture is built upon the foundational FastRNN model, which introduces a residual connection to address instability issues such as exploding and vanishing gradients inherent in conventional RNNs. FastGRNN extends this approach to a gated recurrent model that reuses its RNN matrices to achieve accuracy comparable to state-of-the-art gated architectures while significantly reducing model complexity and size.

The FastRNN model integrates a residual connection in its architecture, introducing only two additional parameters, to stabilize learning. This modification addresses the exploding and vanishing gradients problem without constraining the hidden state transition matrix, as is typical in unitary RNNs. The authors assert that FastRNN can achieve superior prediction accuracy compared to unitary RNNs such as SpectralRNN while maintaining faster training times due to its more efficient gradient flow and convergence properties. This is supported by a theoretical analysis proving that the gradient condition number is independent of sequence length, unlike traditional RNNs where it can exponentially increase with sequence length.

The FastGRNN extension further evolves by employing a gate mechanism that reuses existing RNN matrices, thus enhancing expressive power without a substantial increase in model size. FastGRNN reduces model complexity by enforcing matrix properties such as low-rank, sparseness, and quantization. This leads to a reduction in model size by up to 35x in comparison to existing architectures like GRUs and LSTMs, without sacrificing prediction accuracy. Such a reduction in model size positions FastGRNN as an ideal candidate for deployment on Internet of Things (IoT) devices and other memory-limited hardware.

Empirically, FastGRNN shows robust performance across several tasks including speech command recognition and language processing with datasets such as Google-12 and PTB. It maintains comparable or superior predictive power to advanced RNN frameworks while significantly reducing computational resource requirements. For instance, in deployment scenarios on devices like the Arduino MKR1000, FastGRNN yields faster execution times compared to unitary RNNs by a margin of about 20-135 times, confirming its suitability for real-time applications.

The paper provides detailed results and comparisons, underscoring FastGRNN's effectiveness in balancing the trade-off between accuracy and resource efficiency. Through this exploration, it contributes a foundational framework for the development of more efficient RNN architectures that do not compromise on performance while meeting the stringent constraints of modern ubiquitous computing environments.

In conclusion, the FastGRNN approach presents compelling improvements in the compactness and efficiency of RNN architectures, with noteworthy implications for AI applications constrained by computational resources. Future work could focus on extending these concepts further, perhaps by integrating more sophisticated gating mechanisms or exploring alternative low-rank approximations, to continue pushing the boundaries of efficiency in neural network deployments in constrained environments.