Taylor-Lagrange Neural ODE Solver

Updated 7 January 2026

TL-NODE is a neural ODE solver that leverages fixed-order Taylor expansion and a Lagrange remainder estimator for efficient integration and training.
It minimizes computational overhead by reducing the number of adaptive function evaluations while preserving accuracy in both supervised and generative tasks.
Empirical results show significant speedups in training and evaluation, making TL-NODE suitable for real-time and large-scale applications.

Taylor-Lagrange Neural Ordinary Differential Equations (TL-NODEs) are a class of neural ODE solvers that combine fixed-order Taylor expansions with a data-driven Lagrange remainder estimator to accelerate the integration and training of neural ODEs. TL-NODE addresses the computational bottlenecks of standard NODE training and evaluation, especially the high cost imposed by adaptive-step solvers and repeated neural network evaluations, while maintaining or improving accuracy across supervised and generative modeling tasks (Djeumou et al., 2022).

1. Motivation and Foundational Concepts

A standard neural ordinary differential equation (NODE) parametrizes continuous-time dynamics by a neural network:

$\frac{dx}{dt} = f(x(t); \theta),$

where $x(t) \in \mathbb{R}^n$ and $f(\cdot;\theta)$ is a neural network with parameters $\theta$ . Solving for $x(T)$ given $x(t_0)$ typically requires numerically integrating $f$ , often with adaptive schemes (e.g., Dormand–Prince “Dopri5”). These schemes provide accuracy but do so at the cost of numerous evaluations of $f$ per integration interval, leading to high compute and memory cost, especially with gradient-based training methods that require both forward and backward passes through the neural dynamics. This bottleneck becomes acute for large-scale learning or deployment settings where fast inference is critical.

TL-NODE mitigates this by replacing the adaptive solver with a fixed-order Taylor expansion plus an estimated Lagrange remainder, allowing for a constant and low number of network evaluations per step. The Taylor expansion advances the solution deterministically, while a small auxiliary neural network estimates and corrects for the truncation error, preserving the desired accuracy in only a few function and derivative evaluations per step.

2. Mathematical Formulation

The central update of TL-NODE approximates the flow of the ODE as:

$x_{i+1} \approx x_i + \sum_{k=1}^p \frac{\Delta t^k}{k!} f^{(k-1)}(x_i) + R_{p+1}(\Delta t),$

where $f^{(k)}(x) = \frac{d^k}{dt^k} x(t) |_{t}$ , and $x(t) \in \mathbb{R}^n$ 0 is the (unknown) Taylor-Lagrange remainder encapsulating the local truncation error (of order $x(t) \in \mathbb{R}^n$ 1).

To estimate the remainder, TL-NODE introduces a second neural network $x(t) \in \mathbb{R}^n$ 2, which predicts the appropriate “midpoint” $x(t) \in \mathbb{R}^n$ 3 (for some $x(t) \in \mathbb{R}^n$ 4) where the $x(t) \in \mathbb{R}^n$ 5-th derivative should be evaluated for optimal local error correction. The corrected update then reads:

$x(t) \in \mathbb{R}^n$ 6

where $x(t) \in \mathbb{R}^n$ 7 and the notation $x(t) \in \mathbb{R}^n$ 8 denotes the $x(t) \in \mathbb{R}^n$ 9-th total time derivative computed via Taylor-mode automatic differentiation.

The joint training objective alternates between fitting the main NODE parameters $f(\cdot;\theta)$ 0 (using standard supervised or likelihood loss, penalizing large higher derivatives for regularization) and fitting the remainder network parameters $f(\cdot;\theta)$ 1 (by minimizing squared error against high-accuracy solutions produced by a standard ODE solver).

3. Training Procedure and Integration Algorithm

The TL-NODE integration pipeline partitions the interval $f(\cdot;\theta)$ 2 into $f(\cdot;\theta)$ 3 steps of size $f(\cdot;\theta)$ 4. At each step, the procedure is:

Compute Taylor coefficients $f(\cdot;\theta)$ 5 at $f(\cdot;\theta)$ 6 via Taylor-mode automatic differentiation.
Use $f(\cdot;\theta)$ 7 to predict an in-state midpoint $f(\cdot;\theta)$ 8.
Update $f(\cdot;\theta)$ 9 with the truncated Taylor sum (up to $\theta$ 0) plus the $\theta$ 1-th order term (using $\theta$ 2).

All operations are differentiable; standard backpropagation suffices, and there is no need for the adjoint-state method typical in standard neural ODEs. Memory usage is limited to storing relevant states and model parameters.

TL-NODE Forward Pass Pseudocode

$f$ 1

4. Computational Complexity

Contrasting with conventional ODE solvers, TL-NODE maintains a fixed and small number of function evaluations ( $\theta$ 3 per step versus $\theta$ 4 for adaptive integrators). The Taylor-mode AD pass per step is $\theta$ 5 or $\theta$ 6 operations, negligible for small $\theta$ 7 and practical network sizes. Memory overhead is similarly reduced; there is no need to store augmented continuous states or solver internals.

In summary, per time step:

Method	Time Complexity	Memory Overhead
Standard (Dopri5)	$\theta$ 8	$\theta$ 9 (reverse-mode)
TL-NODE	$x(T)$ 0	$x(T)$ 1 beyond parameters

Here $x(T)$ 2 is the cost of one $x(T)$ 3-evaluation; $x(T)$ 4 is the cost for automatic differentiation.

5. Empirical Results

TL-NODE was benchmarked against standard NODE solvers (Dopri5, RK4), fixed-order Taylor methods without the Lagrange correction, and hypersolver schemes on a range of tasks:

Stiff ODE Integration

System: $x(T)$ 5, eigenvalues $x(T)$ 6, one step per interval.
TL-NODE ( $x(T)$ 7): error $x(T)$ 8 to $x(T)$ 9, evaluation time $x(t_0)$ 0 s.
Dopri5: error $x(t_0)$ 1 to $x(t_0)$ 2, but $x(t_0)$ 3 s evaluation (factor of $x(t_0)$ 4 slower).
RK4/others: error $x(t_0)$ 5 (failed for long horizons).

Learning Stiff Dynamics

2D $x(t_0)$ 6-matrix ODE, $x(t_0)$ 7 s.
TL-NODE: MSE $x(t_0)$ 8, training time $x(t_0)$ 9 s.
Vanilla NODE (Dopri5): MSE matched but $f$ 0 s training time.
RK4 NODE, T-NODE (no correction): similar or slightly faster than TL-NODE but with higher error.

Image Classification (MNIST)

Model: 2-layer NODE (100→728 hidden units).
Results:

Method	Train Acc	Test Acc	Train Time	Eval Time	NFE
Vanilla NODE	99.33%	97.87%	42.7 min	16 ms	110.6
TL-NODE	99.96%	98.23%	2.55 min	1.04 ms	62

TL-NODE achieves $f$ 1 faster training, $f$ 2 faster evaluation, and greater accuracy with lower function-evaluation counts.

Density Estimation (MiniBooNE)

43-dim continuous normalizing flow task.
TL-NODE best loss $f$ 3 nats (12.3 min, NFE=168); vanilla NODE $f$ 4 (59.7 min, NFE=184).

6. Limitations and Prospective Developments

Current TL-NODE instantiations operate at a fixed Taylor expansion order $f$ 5 and fixed integration step $f$ 6. For highly stiff or long-horizon systems, higher $f$ 7 or smaller $f$ 8 may be necessary to maintain accuracy, reducing computational gains.

Envisaged future work includes:

Adaptive order/step selection: Dynamic adjustment of $f$ 9 and/or $f$ 0 based on local error criteria.
Stiffness-aware extensions: Coupling with implicit Taylor methods or symplectic constraints for better performance on energy-conserving or stiff systems.
Parallel higher-order AD: Optimizing Taylor-mode AD for larger expansion orders using efficient hardware (e.g., GPUs).

7. Significance

TL-NODE replaces expensive, black-box adaptive solvers with a hybrid of fixed-order Taylor expansion and data-driven Lagrange remainder, achieving up to one order-of-magnitude speedup for training and inference without loss of accuracy. This enables deployment of neural ODEs in real-time and large-scale scenarios across diverse domains, including modeling physical systems, supervised learning, and generative modeling (Djeumou et al., 2022).

Markdown Report Issue Upgrade to Chat

References (1)

Taylor-Lagrange Neural Ordinary Differential Equations: Toward Fast Training and Evaluation of Neural ODEs (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Taylor-Lagrange Neural ODE (TL-NODE).