Papers
Topics
Authors
Recent
Search
2000 character limit reached

xLSTM-PINN: Spectral Enhancement for PDE Solvers

Updated 20 November 2025
  • The paper introduces xLSTM-PINN, a spectral remodeling extension to PINNs that leverages memory-gated, multiscale xLSTM blocks to elevate high-frequency learning.
  • It employs a staged frequency curriculum and adaptive residual reweighting to address spectral bias and improve convergence and extrapolation in solving PDEs.
  • Empirical benchmarks on various PDE problems show significant accuracy gains, enhanced generalization, and superior performance compared to conventional PINNs.

xLSTM-PINN is a spectral remodeling extension of physics-informed neural networks (PINNs) designed to mitigate spectral bias, residual-data imbalance, and poor extrapolation in neural PDE solvers. By introducing memory-gated, multiscale feature extraction via xLSTM blocks, coupled with a staged frequency curriculum and adaptive residual reweighting, xLSTM-PINN systematically elevates the neural tangent kernel (NTK) spectrum for high-frequency learning. The method achieves both theoretically justified and empirically significant improvement in accuracy, convergence, and extrapolation on benchmark PDEs, without modifications to the standard physics loss or automatic differentiation routines (Tao et al., 16 Nov 2025).

1. Architecture: xLSTM Blocks and Gated Memory

xLSTM-PINN replaces the generic multilayer perceptron core of conventional PINNs with a stack of xLSTM blocks. Each block is composed of an internal multiscale, memory-gated recursion (“micro-time” steps) and a light, nonlinear feed-forward mixer.

During each of the SS internal micro-steps within a block \ell, the state evolves as follows:

  • Hidden state htRWh_t \in \mathbb{R}^W
  • Memory cell ctRWc_t \in \mathbb{R}^W
  • Duty-cycle scalar ntRWn_t \in \mathbb{R}^W
  • Logarithmic-scale gate accumulator mtRWm_t \in \mathbb{R}^W
  • Evolving block representation utRWu_t \in \mathbb{R}^W

The steps comprise (simplified from Eqs. 3–6):

  1. Compute gates and candidate:

[gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell

it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)

  1. Log-space stabilization and normalized gating:

mt+1=max(logft+mt,logit)m_{t+1} = \max(\log f_t + m_t, \log i_t)

\ell0

  1. Memory state and output update:

\ell1

\ell2

\ell3

\ell4

Each block aggregates features at each micro-step (different “scales”) through a learnable, LSTM-style gating function:

\ell5

with \ell6. After all \ell7 steps, a weighted aggregation \ell8 is merged into the layer’s output.

A shallow gated feed-forward mixer then computes

\ell9

htRWh_t \in \mathbb{R}^W0

where htRWh_t \in \mathbb{R}^W1 is a sigmoid gate and htRWh_t \in \mathbb{R}^W2 are htRWh_t \in \mathbb{R}^W3 activations.

Parameter sharing across micro-steps ensures model depth O(htRWh_t \in \mathbb{R}^W4) with parameter count O(htRWh_t \in \mathbb{R}^W5), matching baseline MLP-based PINNs but with richer representational capacity (Tao et al., 16 Nov 2025).

2. Spectral-Bias Mitigation via Frequency Curriculum and Residual Reweighting

xLSTM-PINN directly addresses the spectral bias inherent to standard PINN training. This is accomplished with two orthogonal scheduling mechanisms:

2.1 Frequency Curriculum:

During early training, the residual loss is softly low-pass filtered:

htRWh_t \in \mathbb{R}^W6

Here htRWh_t \in \mathbb{R}^W7, the frequency cutoff, smoothly grows to its final value over a curriculum of htRWh_t \in \mathbb{R}^W8 steps, ensuring the network resolves large-scale structure before high-frequency detail.

2.2 Adaptive Residual Reweighting:

Each collocation point’s residual is exponentially reweighted according to its current error:

htRWh_t \in \mathbb{R}^W9

ctRWc_t \in \mathbb{R}^W0

with ctRWc_t \in \mathbb{R}^W1–ctRWc_t \in \mathbb{R}^W2. This adaptively prioritizes harder, typically higher-frequency regions during gradient descent.

Combined with the xLSTM block’s effect on the empirical NTK—where high-frequency eigenvalues ctRWc_t \in \mathbb{R}^W3 are amplified by ctRWc_t \in \mathbb{R}^W4—these procedures jointly lift the NTK tail and suppress spectral bias (Tao et al., 16 Nov 2025).

3. Optimization Protocols and Hyperparameters

The combined objective is

ctRWc_t \in \mathbb{R}^W5

where the ctRWc_t \in \mathbb{R}^W6 parameters balance residual, Dirichlet, Neumann, and initial-condition losses, with ctRWc_t \in \mathbb{R}^W7 incorporating ctRWc_t \in \mathbb{R}^W8 or Jacobian regularization as needed.

Empirically validated choices include:

  • Block width ctRWc_t \in \mathbb{R}^W9, depth ntRWn_t \in \mathbb{R}^W0, micro-steps ntRWn_t \in \mathbb{R}^W1
  • ntRWn_t \in \mathbb{R}^W230,000 total parameters for parity with baseline PINN
  • Adam optimizer, learning rate ntRWn_t \in \mathbb{R}^W3 decaying to ntRWn_t \in \mathbb{R}^W4 (cosine schedule)
  • Frequency cutoff schedule ntRWn_t \in \mathbb{R}^W5, ntRWn_t \in \mathbb{R}^W6
  • Residual reweight parameter ntRWn_t \in \mathbb{R}^W7–ntRWn_t \in \mathbb{R}^W8
  • LayerNorm applied to ntRWn_t \in \mathbb{R}^W9 in each block
  • Training stabilization: freeze xLSTM gates for the first mtRWm_t \in \mathbb{R}^W0 steps (mtRWm_t \in \mathbb{R}^W1, mtRWm_t \in \mathbb{R}^W2, mtRWm_t \in \mathbb{R}^W3 fixed at mtRWm_t \in \mathbb{R}^W4), gradient clipping at norm mtRWm_t \in \mathbb{R}^W5, early stopping by validation residual

4. Quantitative Benchmarks and Frequency Analysis

xLSTM-PINN and baseline PINN were evaluated under identical sample and parameter budgets (3,000 interior/boundary samples, mtRWm_t \in \mathbb{R}^W630k parameters) on four PDE problems:

PDE MSE RMSE MAE MaxAE
1D Advection–Reaction mtRWm_t \in \mathbb{R}^W7 mtRWm_t \in \mathbb{R}^W8 mtRWm_t \in \mathbb{R}^W9 utRWu_t \in \mathbb{R}^W0
2D Laplace (mixed BCs) utRWu_t \in \mathbb{R}^W1 utRWu_t \in \mathbb{R}^W2 utRWu_t \in \mathbb{R}^W3 utRWu_t \in \mathbb{R}^W4
Steady Heat in Disk (Robin BC) utRWu_t \in \mathbb{R}^W5 utRWu_t \in \mathbb{R}^W6 utRWu_t \in \mathbb{R}^W7 utRWu_t \in \mathbb{R}^W8
Anisotropic Poisson–Beam (4th order) utRWu_t \in \mathbb{R}^W9 [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell0 [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell1 [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell2

Frequency-domain diagnostics substantiate the claimed suppression of spectral bias:

  • Endpoint error [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell3 ([gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell4 plane wave fit) is lower in high [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell5; plateau lowered by [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell6
  • Spectral gain [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell7 exceeds 1.5–3.0 for [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell8
  • Time to error threshold [gi,gf,go,gz]=Wut+Uht+b[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell9 shortened by 30–50%
  • Resolvable bandwidth it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)0 up by it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)125%

In field-space, xLSTM-PINN produces sharply localized error, cleaner boundary transitions, and significantly less high-it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)2 contamination (it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)310% energy in it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)4 vs it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)540% for baseline) (Tao et al., 16 Nov 2025).

5. Extrapolation and Generalization

Extrapolation assessments demonstrate superior robustness:

  • On 1D advection, training on it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)6 and prediction on it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)7 yields it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)81% error for xLSTM-PINN up to it=exp(gi),ft=σ(gf) or exp(gf),ot=σ(go),zt=tanh(gz)i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)9, whereas baseline PINN’s error grows exponentially past mt+1=max(logft+mt,logit)m_{t+1} = \max(\log f_t + m_t, \log i_t)0.
  • For the 2D Laplace problem with 10% of the boundary data removed (“O-shaped” deficit), xLSTM-PINN reconstructs the missing region with mt+1=max(logft+mt,logit)m_{t+1} = \max(\log f_t + m_t, \log i_t)1 error, while the baseline PINN exhibits substantial error.

Memory-gated micro-step recursions approximate an ODE in feature space, imparting greater robustness to off-manifold or out-of-distribution inputs. The cross-scale memory at each layer further enables data-deficient scales to be reconstructed from related features, smoothing the NTK spectrum and reducing overfitting to the observed spectral envelope (Tao et al., 16 Nov 2025).

6. Implications and Extensions

The xLSTM block is modular and can be integrated into any PINN extension, including Fourier-feature PINNs, conservative or stochastic variants, and multi-fidelity setups, without requiring changes to physics loss functions or optimizers. For time-dependent PDEs, the internal micro-step refinement can be extended to both spatial and temporal resolutions. In inverse or multi-fidelity modeling contexts, memory gating can serve a cross-scale autoencoding role, mediating between low- and high-fidelity surrogates.

Architectural spectral engineering—lifting the NTK tail at the representation level—is shown to be as effective as direct loss reweighting strategies for bias mitigation. A plausible implication is that further advances in PDE generalization could arise from hybrid approaches that combine representation- and loss-level spectral control (Tao et al., 16 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to xLSTM-PINN.