Papers
Topics
Authors
Recent
Search
2000 character limit reached

Taylor-Lagrange Neural ODE Solver

Updated 7 January 2026
  • TL-NODE is a neural ODE solver that leverages fixed-order Taylor expansion and a Lagrange remainder estimator for efficient integration and training.
  • It minimizes computational overhead by reducing the number of adaptive function evaluations while preserving accuracy in both supervised and generative tasks.
  • Empirical results show significant speedups in training and evaluation, making TL-NODE suitable for real-time and large-scale applications.

Taylor-Lagrange Neural Ordinary Differential Equations (TL-NODEs) are a class of neural ODE solvers that combine fixed-order Taylor expansions with a data-driven Lagrange remainder estimator to accelerate the integration and training of neural ODEs. TL-NODE addresses the computational bottlenecks of standard NODE training and evaluation, especially the high cost imposed by adaptive-step solvers and repeated neural network evaluations, while maintaining or improving accuracy across supervised and generative modeling tasks (Djeumou et al., 2022).

1. Motivation and Foundational Concepts

A standard neural ordinary differential equation (NODE) parametrizes continuous-time dynamics by a neural network:

dxdt=f(x(t);θ),\frac{dx}{dt} = f(x(t); \theta),

where x(t)Rnx(t) \in \mathbb{R}^n and f(;θ)f(\cdot;\theta) is a neural network with parameters θ\theta. Solving for x(T)x(T) given x(t0)x(t_0) typically requires numerically integrating ff, often with adaptive schemes (e.g., Dormand–Prince “Dopri5”). These schemes provide accuracy but do so at the cost of numerous evaluations of ff per integration interval, leading to high compute and memory cost, especially with gradient-based training methods that require both forward and backward passes through the neural dynamics. This bottleneck becomes acute for large-scale learning or deployment settings where fast inference is critical.

TL-NODE mitigates this by replacing the adaptive solver with a fixed-order Taylor expansion plus an estimated Lagrange remainder, allowing for a constant and low number of network evaluations per step. The Taylor expansion advances the solution deterministically, while a small auxiliary neural network estimates and corrects for the truncation error, preserving the desired accuracy in only a few function and derivative evaluations per step.

2. Mathematical Formulation

The central update of TL-NODE approximates the flow of the ODE as:

xi+1xi+k=1pΔtkk!f(k1)(xi)+Rp+1(Δt),x_{i+1} \approx x_i + \sum_{k=1}^p \frac{\Delta t^k}{k!} f^{(k-1)}(x_i) + R_{p+1}(\Delta t),

where f(k)(x)=dkdtkx(t)tf^{(k)}(x) = \frac{d^k}{dt^k} x(t) |_{t}, and Rp+1R_{p+1} is the (unknown) Taylor-Lagrange remainder encapsulating the local truncation error (of order O(Δtp+1)O(\Delta t^{p+1})).

To estimate the remainder, TL-NODE introduces a second neural network g(x,h;ϕ)g(x, h; \phi), which predicts the appropriate “midpoint” x(ξ)x(\xi) (for some ξ[t,t+h]\xi \in [t, t+h]) where the pp-th derivative should be evaluated for optimal local error correction. The corrected update then reads:

xi+1=xi+=1p1Δt!f[](xi)+Δtpp!f[p](x^mid),x_{i+1} = x_i + \sum_{\ell=1}^{p-1} \frac{\Delta t^\ell}{\ell!} f^{[\ell]}(x_i) + \frac{\Delta t^p}{p!} f^{[p]}(\hat{x}_\text{mid}),

where x^mid=xi+g(xi,Δt;ϕ)f(xi;θ)\hat{x}_\text{mid} = x_i + g(x_i, \Delta t; \phi) \odot f(x_i; \theta) and the notation f[]f^{[\ell]} denotes the \ell-th total time derivative computed via Taylor-mode automatic differentiation.

The joint training objective alternates between fitting the main NODE parameters θ\theta (using standard supervised or likelihood loss, penalizing large higher derivatives for regularization) and fitting the remainder network parameters ϕ\phi (by minimizing squared error against high-accuracy solutions produced by a standard ODE solver).

3. Training Procedure and Integration Algorithm

The TL-NODE integration pipeline partitions the interval [t0,T][t_0, T] into HH steps of size Δt=(Tt0)/H\Delta t = (T - t_0)/H. At each step, the procedure is:

  1. Compute Taylor coefficients f[1],,f[p]f^{[1]},\dotsc,f^{[p]} at xix_i via Taylor-mode automatic differentiation.
  2. Use g(xi,Δt;ϕ)g(x_i, \Delta t; \phi) to predict an in-state midpoint x^mid\hat{x}_\text{mid}.
  3. Update xi+1x_{i+1} with the truncated Taylor sum (up to p1p-1) plus the pp-th order term (using x^mid\hat{x}_\text{mid}).

All operations are differentiable; standard backpropagation suffices, and there is no need for the adjoint-state method typical in standard neural ODEs. Memory usage is limited to storing relevant states and model parameters.

TL-NODE Forward Pass Pseudocode

1
2
3
4
5
6
7
8
9
function TL-NODE-Solve(x₀, t₀, T; θ, φ, p, H)
    Δt ← (T−t₀)/H
    x ← x₀
    for i = 0 to H−1 do
        {f^{[1]},…,f^{[p]}} ← TaylorModeAD(f_θ, x)
        x̂_mid ← x + g(x, Δt; φ)⊙f^{[1]}
        x ← x + Σ_{ℓ=1}^{p−1} Δt^ℓ/ℓ! · f^{[ℓ]}
              + Δt^p/p! · f^{[p]}(x̂_mid)
    return x

4. Computational Complexity

Contrasting with conventional ODE solvers, TL-NODE maintains a fixed and small number of function evaluations (O(1)O(1) per step versus O(Nadapt)1O(N_\text{adapt}) \gg 1 for adaptive integrators). The Taylor-mode AD pass per step is O(p2)O(p^2) or O(plogp)O(p \log p) operations, negligible for small pp and practical network sizes. Memory overhead is similarly reduced; there is no need to store augmented continuous states or solver internals.

In summary, per time step:

Method Time Complexity Memory Overhead
Standard (Dopri5) O(NadaptCf)O(N_\text{adapt} \cdot C_f) O(Nadapt)O(N_\text{adapt}) (reverse-mode)
TL-NODE O(Cf+p2CAD)O(C_f + p^2 C_\text{AD}) O(1)O(1) beyond parameters

Here CfC_f is the cost of one ff-evaluation; CADC_\text{AD} is the cost for automatic differentiation.

5. Empirical Results

TL-NODE was benchmarked against standard NODE solvers (Dopri5, RK4), fixed-order Taylor methods without the Lagrange correction, and hypersolver schemes on a range of tasks:

Stiff ODE Integration

  • System: x˙=Ax\dot{x} = Ax, eigenvalues 1,1000-1, -1000, one step per interval.
  • TL-NODE (p=1p=1): error 105\sim 10^{-5} to 10410^{-4}, evaluation time 3.8×1053.8 \times 10^{-5} s.
  • Dopri5: error 1013\sim 10^{-13} to 101210^{-12}, but $0.004$ s evaluation (factor of 100×100\times slower).
  • RK4/others: error 1.0\approx 1.0 (failed for long horizons).

Learning Stiff Dynamics

  • 2D AA-matrix ODE, Tt0=0.01T-t_0 = 0.01 s.
  • TL-NODE: MSE 6×106\sim 6 \times 10^{-6}, training time $31.9$ s.
  • Vanilla NODE (Dopri5): MSE matched but $609.8$ s training time.
  • RK4 NODE, T-NODE (no correction): similar or slightly faster than TL-NODE but with higher error.

Image Classification (MNIST)

  • Model: 2-layer NODE (100→728 hidden units).
  • Results:
Method Train Acc Test Acc Train Time Eval Time NFE
Vanilla NODE 99.33% 97.87% 42.7 min 16 ms 110.6
TL-NODE 99.96% 98.23% 2.55 min 1.04 ms 62

TL-NODE achieves 17×\approx 17 \times faster training, 15×15 \times faster evaluation, and greater accuracy with lower function-evaluation counts.

Density Estimation (MiniBooNE)

  • 43-dim continuous normalizing flow task.
  • TL-NODE best loss $9.62$ nats (12.3 min, NFE=168); vanilla NODE $9.74$ (59.7 min, NFE=184).

6. Limitations and Prospective Developments

Current TL-NODE instantiations operate at a fixed Taylor expansion order pp and fixed integration step Δt\Delta t. For highly stiff or long-horizon systems, higher pp or smaller Δt\Delta t may be necessary to maintain accuracy, reducing computational gains.

Envisaged future work includes:

  • Adaptive order/step selection: Dynamic adjustment of pp and/or Δt\Delta t based on local error criteria.
  • Stiffness-aware extensions: Coupling with implicit Taylor methods or symplectic constraints for better performance on energy-conserving or stiff systems.
  • Parallel higher-order AD: Optimizing Taylor-mode AD for larger expansion orders using efficient hardware (e.g., GPUs).

7. Significance

TL-NODE replaces expensive, black-box adaptive solvers with a hybrid of fixed-order Taylor expansion and data-driven Lagrange remainder, achieving up to one order-of-magnitude speedup for training and inference without loss of accuracy. This enables deployment of neural ODEs in real-time and large-scale scenarios across diverse domains, including modeling physical systems, supervised learning, and generative modeling (Djeumou et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Taylor-Lagrange Neural ODE (TL-NODE).