xLSTM-PINN: Spectral Enhancement for PDE Solvers

Updated 20 November 2025

The paper introduces xLSTM-PINN, a spectral remodeling extension to PINNs that leverages memory-gated, multiscale xLSTM blocks to elevate high-frequency learning.
It employs a staged frequency curriculum and adaptive residual reweighting to address spectral bias and improve convergence and extrapolation in solving PDEs.
Empirical benchmarks on various PDE problems show significant accuracy gains, enhanced generalization, and superior performance compared to conventional PINNs.

xLSTM-PINN is a spectral remodeling extension of physics-informed neural networks (PINNs) designed to mitigate spectral bias, residual-data imbalance, and poor extrapolation in neural PDE solvers. By introducing memory-gated, multiscale feature extraction via xLSTM blocks, coupled with a staged frequency curriculum and adaptive residual reweighting, xLSTM-PINN systematically elevates the neural tangent kernel (NTK) spectrum for high-frequency learning. The method achieves both theoretically justified and empirically significant improvement in accuracy, convergence, and extrapolation on benchmark PDEs, without modifications to the standard physics loss or automatic differentiation routines (Tao et al., 16 Nov 2025).

1. Architecture: xLSTM Blocks and Gated Memory

xLSTM-PINN replaces the generic multilayer perceptron core of conventional PINNs with a stack of xLSTM blocks. Each block is composed of an internal multiscale, memory-gated recursion (“micro-time” steps) and a light, nonlinear feed-forward mixer.

During each of the $S$ internal micro-steps within a block $\ell$ , the state evolves as follows:

Hidden state $h_t \in \mathbb{R}^W$
Memory cell $c_t \in \mathbb{R}^W$
Duty-cycle scalar $n_t \in \mathbb{R}^W$
Logarithmic-scale gate accumulator $m_t \in \mathbb{R}^W$
Evolving block representation $u_t \in \mathbb{R}^W$

The steps comprise (simplified from Eqs. 3–6):

Compute gates and candidate:

$[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$

$i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$

Log-space stabilization and normalized gating:

$m_{t+1} = \max(\log f_t + m_t, \log i_t)$

$\ell$ 0

Memory state and output update:

$\ell$ 1

$\ell$ 2

$\ell$ 3

$\ell$ 4

Each block aggregates features at each micro-step (different “scales”) through a learnable, LSTM-style gating function:

$\ell$ 5

with $\ell$ 6. After all $\ell$ 7 steps, a weighted aggregation $\ell$ 8 is merged into the layer’s output.

A shallow gated feed-forward mixer then computes

$\ell$ 9

$h_t \in \mathbb{R}^W$ 0

where $h_t \in \mathbb{R}^W$ 1 is a sigmoid gate and $h_t \in \mathbb{R}^W$ 2 are $h_t \in \mathbb{R}^W$ 3 activations.

Parameter sharing across micro-steps ensures model depth O( $h_t \in \mathbb{R}^W$ 4) with parameter count O( $h_t \in \mathbb{R}^W$ 5), matching baseline MLP-based PINNs but with richer representational capacity (Tao et al., 16 Nov 2025).

2. Spectral-Bias Mitigation via Frequency Curriculum and Residual Reweighting

xLSTM-PINN directly addresses the spectral bias inherent to standard PINN training. This is accomplished with two orthogonal scheduling mechanisms:

2.1 Frequency Curriculum:

During early training, the residual loss is softly low-pass filtered:

$h_t \in \mathbb{R}^W$ 6

Here $h_t \in \mathbb{R}^W$ 7, the frequency cutoff, smoothly grows to its final value over a curriculum of $h_t \in \mathbb{R}^W$ 8 steps, ensuring the network resolves large-scale structure before high-frequency detail.

2.2 Adaptive Residual Reweighting:

Each collocation point’s residual is exponentially reweighted according to its current error:

$h_t \in \mathbb{R}^W$ 9

$c_t \in \mathbb{R}^W$ 0

with $c_t \in \mathbb{R}^W$ 1– $c_t \in \mathbb{R}^W$ 2. This adaptively prioritizes harder, typically higher-frequency regions during gradient descent.

Combined with the xLSTM block’s effect on the empirical NTK—where high-frequency eigenvalues $c_t \in \mathbb{R}^W$ 3 are amplified by $c_t \in \mathbb{R}^W$ 4—these procedures jointly lift the NTK tail and suppress spectral bias (Tao et al., 16 Nov 2025).

3. Optimization Protocols and Hyperparameters

The combined objective is

$c_t \in \mathbb{R}^W$ 5

where the $c_t \in \mathbb{R}^W$ 6 parameters balance residual, Dirichlet, Neumann, and initial-condition losses, with $c_t \in \mathbb{R}^W$ 7 incorporating $c_t \in \mathbb{R}^W$ 8 or Jacobian regularization as needed.

Empirically validated choices include:

Block width $c_t \in \mathbb{R}^W$ 9, depth $n_t \in \mathbb{R}^W$ 0, micro-steps $n_t \in \mathbb{R}^W$ 1
$n_t \in \mathbb{R}^W$ 230,000 total parameters for parity with baseline PINN
Adam optimizer, learning rate $n_t \in \mathbb{R}^W$ 3 decaying to $n_t \in \mathbb{R}^W$ 4 (cosine schedule)
Frequency cutoff schedule $n_t \in \mathbb{R}^W$ 5, $n_t \in \mathbb{R}^W$ 6
Residual reweight parameter $n_t \in \mathbb{R}^W$ 7– $n_t \in \mathbb{R}^W$ 8
LayerNorm applied to $n_t \in \mathbb{R}^W$ 9 in each block
Training stabilization: freeze xLSTM gates for the first $m_t \in \mathbb{R}^W$ 0 steps ( $m_t \in \mathbb{R}^W$ 1, $m_t \in \mathbb{R}^W$ 2, $m_t \in \mathbb{R}^W$ 3 fixed at $m_t \in \mathbb{R}^W$ 4), gradient clipping at norm $m_t \in \mathbb{R}^W$ 5, early stopping by validation residual

4. Quantitative Benchmarks and Frequency Analysis

xLSTM-PINN and baseline PINN were evaluated under identical sample and parameter budgets (3,000 interior/boundary samples, $m_t \in \mathbb{R}^W$ 630k parameters) on four PDE problems:

PDE	MSE	RMSE	MAE	MaxAE
1D Advection–Reaction	$m_t \in \mathbb{R}^W$ 7	$m_t \in \mathbb{R}^W$ 8	$m_t \in \mathbb{R}^W$ 9	$u_t \in \mathbb{R}^W$ 0
2D Laplace (mixed BCs)	$u_t \in \mathbb{R}^W$ 1	$u_t \in \mathbb{R}^W$ 2	$u_t \in \mathbb{R}^W$ 3	$u_t \in \mathbb{R}^W$ 4
Steady Heat in Disk (Robin BC)	$u_t \in \mathbb{R}^W$ 5	$u_t \in \mathbb{R}^W$ 6	$u_t \in \mathbb{R}^W$ 7	$u_t \in \mathbb{R}^W$ 8
Anisotropic Poisson–Beam (4th order)	$u_t \in \mathbb{R}^W$ 9	$[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 0	$[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 1	$[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 2

Frequency-domain diagnostics substantiate the claimed suppression of spectral bias:

Endpoint error $[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 3 ( $[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 4 plane wave fit) is lower in high $[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 5; plateau lowered by $[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 6
Spectral gain $[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 7 exceeds 1.5–3.0 for $[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 8
Time to error threshold $[g_i, g_f, g_o, g_z] = W^\ell u_t + U^\ell h_t + b^\ell$ 9 shortened by 30–50%
Resolvable bandwidth $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 0 up by $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 125%

In field-space, xLSTM-PINN produces sharply localized error, cleaner boundary transitions, and significantly less high- $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 2 contamination ( $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 310% energy in $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 4 vs $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 540% for baseline) (Tao et al., 16 Nov 2025).

5. Extrapolation and Generalization

Extrapolation assessments demonstrate superior robustness:

On 1D advection, training on $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 6 and prediction on $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 7 yields $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 81% error for xLSTM-PINN up to $i_t = \exp(g_i), \quad f_t = \sigma(g_f) ~\text{or}~ \exp(g_f), \quad o_t = \sigma(g_o), \quad z_t = \tanh(g_z)$ 9, whereas baseline PINN’s error grows exponentially past $m_{t+1} = \max(\log f_t + m_t, \log i_t)$ 0.
For the 2D Laplace problem with 10% of the boundary data removed (“O-shaped” deficit), xLSTM-PINN reconstructs the missing region with $m_{t+1} = \max(\log f_t + m_t, \log i_t)$ 1 error, while the baseline PINN exhibits substantial error.

Memory-gated micro-step recursions approximate an ODE in feature space, imparting greater robustness to off-manifold or out-of-distribution inputs. The cross-scale memory at each layer further enables data-deficient scales to be reconstructed from related features, smoothing the NTK spectrum and reducing overfitting to the observed spectral envelope (Tao et al., 16 Nov 2025).

6. Implications and Extensions

The xLSTM block is modular and can be integrated into any PINN extension, including Fourier-feature PINNs, conservative or stochastic variants, and multi-fidelity setups, without requiring changes to physics loss functions or optimizers. For time-dependent PDEs, the internal micro-step refinement can be extended to both spatial and temporal resolutions. In inverse or multi-fidelity modeling contexts, memory gating can serve a cross-scale autoencoding role, mediating between low- and high-fidelity surrogates.

Architectural spectral engineering—lifting the NTK tail at the representation level—is shown to be as effective as direct loss reweighting strategies for bias mitigation. A plausible implication is that further advances in PDE generalization could arise from hybrid approaches that combine representation- and loss-level spectral control (Tao et al., 16 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Spectral Bias Mitigation via xLSTM-PINN: Memory-Gated Representation Refinement for Physics-Informed Learning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to xLSTM-PINN.