A priori determination of Newton iterations for parallelizing non-linear RNNs
Determine, for the Newton’s method approach that parallelizes non-linear recurrent neural networks by casting the recurrence across a length‑L sequence as a system of L non-linear equations (as in Danieli et al., 2025), the number of Newton iterations required for convergence for a specified non-linear RNN architecture before running the algorithm, so that the compute cost can be assessed a priori.
References
Moreover, each forward pass requires multiple Newton iterations, and the number of iterations needed for convergence is not known a priori for a given architecture.
— M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
(2603.14360 - Mishra et al., 15 Mar 2026) in Limitations of Non-Linear RNNs — Training Inefficiency: Non-Parallelizability on Sequence Length