Papers
Topics
Authors
Recent
Search
2000 character limit reached

Universal Approximation for FNOs

Updated 10 February 2026
  • FNOs are defined as architectures combining linear measurements in the Fourier domain with nonlinear transformations to approximate operators with arbitrary accuracy.
  • The theorem provides guarantees for approximating both scalar-valued and Banach-space-valued operators on compact subsets of Hilbert spaces.
  • Proof techniques such as finite-dimensional proxies, partitions of unity, and bandwidth truncation enable efficient FNO implementations for solving PDEs.

A universal approximation theorem for Fourier Neural Operators (FNOs) provides rigorous guarantees that FNO architectures can approximate any continuous operator between function spaces to arbitrary accuracy, under suitable conditions. FNOs leverage linear measurements in the Fourier domain combined with nonlinear transformations, mirroring the structure common in neural operator learning. This theorem formalizes the universality of FNOs not only for scalar-valued functionals but also for Banach-space-valued operators on compact subsets of Hilbert spaces, and underpins the efficiency and practical relevance of FNO-based operator learning frameworks.

1. Foundations: Operator Approximation via Linear Measurements and Nonlinearities

Let HH be a real Hilbert space (e.g., H=L2(Ω)H = L^2(\Omega) or a Sobolev space), and KHK \subset H a compact subset. In applications, input domains may be HnH^n for nn channels. A general and unifying approximation scheme is as follows:

  • Linear Measurement System: For some mNm \in \mathbb{N}, take a tuple of continuous linear functionals L=(L1,,Lm)L = (L_1, \ldots, L_m), each Lj:HnRL_j: H^n \to \mathbb{R}, which admit representations by Riesz's theorem Lj(x)=i=1nφji(xi)L_j(x) = \sum_{i=1}^n \varphi_{ji}(x_i) with φjiH\varphi_{ji} \in H^*.
  • Two-Stage Model: Construct an approximator of the form

f(x)=Φ(L1(x),,Lm(x)),f(x) = \Phi(L_1(x), \ldots, L_m(x)),

where Φ:RmY\Phi: \mathbb{R}^m \to Y is a continuous nonlinear map (YY a Banach space).

Scalar-Valued Universal Approximation: For any FC(K;R)F \in C(K; \mathbb{R}) and ϵ>0\epsilon > 0, there exist mm, LjL_j, and Φ\Phi such that

supxKF(x)Φ(L1(x),,Lm(x))<ϵ.\sup_{x \in K} |F(x) - \Phi(L_1(x),\ldots,L_m(x))| < \epsilon.

Banach-Valued Universal Approximation (Finite-Rank): For FC(K;Y)F \in C(K; Y), for any ϵ>0\epsilon > 0, there exist mm, LjL_j, y1,,yrYy_1,\ldots,y_r \in Y, and scalar nonlinearities ζj:RR\zeta_j: \mathbb{R} \to \mathbb{R} so that

G(x)=j=1ryjζj(Lj(x)),supxKF(x)G(x)Y<ϵ.G(x) = \sum_{j=1}^r y_j \cdot \zeta_j(L_j(x)), \quad \sup_{x \in K} \|F(x) - G(x)\|_Y < \epsilon.

These structures abstract the "measure, apply nonlinearity, combine" pattern prominent in operator learning (Krylov et al., 3 Feb 2026).

2. Universality of FNOs: Theoretical Guarantees

FNOs are defined over Sobolev or Hölder spaces (e.g., Hs(Td;Rda)H^s(T^d;\mathbb{R}^{d_a})) and map between infinite-dimensional function spaces. The FNO architecture consists of:

  • Lifting: R:HsHsR: H^s \to H^s maps input to higher-dimensional embedding.
  • Fourier Layers: At each depth, apply

L(v)(x)=σ(Wv(x)+b(x)+F1[P(k)F(v)(k)](x)),L_\ell(v)(x) = \sigma\left(W_\ell v(x) + b_\ell(x) + \mathcal{F}^{-1}[P_\ell(k) \mathcal{F}(v)(k)](x)\right),

where WW_\ell and PP_\ell are learned, and σ\sigma is a Lipschitz, nonpolynomial activation.

  • Projection: QQ maps back to output space.

The universal approximation theorem for FNOs states (Kovachki et al., 2021, Yao et al., 16 Dec 2025, Lanthaler et al., 2023, Krylov et al., 3 Feb 2026): For any continuous operator G:HsHsG: H^s \to H^{s'} (or Banach-valued G:XYG: X \to Y), and any compact KHsK \subset H^s, for every ϵ>0\epsilon > 0, there exists a finite-size FNO—characterized by mode cutoff NN, layer depth LL, and width dvd_v—such that

supaKG(a)N(a)Hs<ϵ.\sup_{a \in K} \|G(a) - N(a)\|_{H^{s'}} < \epsilon.

This result holds even if the nonlocal part (Fourier multipliers) is restricted to finitely many modes (Krylov et al., 3 Feb 2026, Lanthaler et al., 2023).

For differentiable operators GC1G \in C^1, the universal approximation extends to joint approximation of GG and its derivative DGDG in operator norm, both uniformly on compact sets (with activations in a suitable smooth class) (Yao et al., 16 Dec 2025).

3. Proof Techniques and Structural Insights

The main proof approaches combine functional analytic and neural network approximation principles:

  • Finite-Dimensional Proxy: Uniform continuity on compact KK enables reduction to finite-dimensional subspaces spanned by sample points, via finite δ\delta-nets and projection.
  • Partition of Unity: Construct continuous partitions on finite-dimensional projections to interpolate the target operator.
  • Bandwidth Truncation: Project input and output to finite Fourier (or basis) modes, yielding effective reduction to neural networks on finite-dimensional Euclidean spaces.
  • Approximation via Classical Theorems: Apply the Stone-Weierstrass theorem and classical neural network universality for finite-dimensional maps to build the nonlinear stage.
  • Lifting and Extension: Map approximators back to the original infinite-dimensional context via Riesz extensions of linear measurement functionals.
  • Specialization to FNOs: Truncate to mm Fourier modes, apply nonlinear transformations per channel, then aggregate—realizing the universality hypotheses in FNO structure (Krylov et al., 3 Feb 2026, Kovachki et al., 2021, Lanthaler et al., 2023).

For Banach-valued operators, finite-rank decompositions with scalar nonlinearities and vector-valued weights yield the required operator structure (Krylov et al., 3 Feb 2026). The techniques generalize from Sobolev to Hölder and Lebesgue scales, and extend to both periodic and general Lipschitz domains via appropriate basis functions or nonlocal operators (Lanthaler et al., 2023).

4. Efficiency, Practical Implications, and Complexity

Universality does not guarantee efficiency, but explicit complexity bounds have been established for prototypical PDE operators:

  • PDE Operators (Darcy, Navier–Stokes): Fourier-Galerkin discretization enables convergent finite-mode representations, and FNO constructions match or surpass classical pseudo-spectral solvers in complexity.
    • For elliptic PDEs: To achieve error ϵ\epsilon, network size scales sublinearly or sublogarithmically in 1/ϵ1/\epsilon under regularity assumptions (e.g., size ϵd/klog(1/ϵ)\sim \epsilon^{-d/k} \log(1/\epsilon) for k>dk>d) (Kovachki et al., 2021).
    • For time-dependent PDEs (Navier–Stokes): With second-order time discretizations, nearly linear or sublinear scaling in 1/ϵ1/\epsilon is attained for large smoothness rr (e.g., size ϵ(1+dr)log(1/ϵ)\lesssim \epsilon^{-(1+\frac{d}{r})} \log(1/\epsilon)) (Kovachki et al., 2021).
  • Micromechanics: For cell problems in homogenization, explicit FNO architectures can approximate the solution operator to arbitrary accuracy with complexity matching FFT-based solvers, independent of materials symmetry or phase geometry, subject only to material contrast constraints (Nguyen et al., 16 Jul 2025).
  • Channels vs Modes: Universality can be achieved by holding the number of Fourier modes fixed (even zero, i.e., using only domain-average nonlocality) and letting channel width grow, suggesting width is the key capacity parameter rather than mode count (Lanthaler et al., 2023).

Practical Constraints: FFT-based FNOs require periodic or regular domains; general domains may need learned continuation or adapted basis functions (Kovachki et al., 2021, Lanthaler et al., 2023). Fixed-sine/cosine trunk basis distinguishes FNOs from DeepONets, which can in principle learn the basis.

5. Universality Beyond Approximation: Derivative-Informed and Weighted Sobolev Settings

Derivative-informed FNOs (DIFNOs) guarantee not only function approximation but also uniform approximation of Fréchet derivatives for C1C^1 operators, both on compact sets and in Lμ2L^2_\mu-weighted Sobolev spaces with unbounded input support (Yao et al., 16 Dec 2025). Specifically, for any ϵ>0\epsilon > 0, there exists an FNO NN such that

supaK{G(a)N(a)Y, DG(a)DN(a)HS}<ϵ,\sup_{a \in K} \big\{ \|G(a) - N(a)\|_Y,~\|DG(a) - DN(a)\|_{HS} \big\} < \epsilon,

or in weighted global norms,

GNLμ22+DGDNLμ22<ϵ2,\|G - N\|^2_{L^2_\mu} + \|DG - DN\|^2_{L^2_\mu} < \epsilon^2,

where μ\mu is a probability measure on X=Hs(Td;Rda)X = H^s(T^d;\mathbb{R}^{d_a}).

Implications: These results guarantee that FNO-based surrogates for PDE-constrained optimization can be made arbitrarily close, both in solution and in derivatives, ensuring meaningful surrogacy for sensitivity-driven optimization or inverse problems (Yao et al., 16 Dec 2025).

6. Minimality, Nonlocality, and Channel-Mode Tradeoffs

Recent work demonstrates that universality of operator approximation is not contingent on access to a fully resolved Fourier spectrum. Even a single nonlocal operation—such as domain averaging (zero Fourier mode)—combined with sufficient nonlinearity and width, suffices for universal operator approximation on compact sets (Lanthaler et al., 2023). The minimal "averaging neural operator" (ANO) architecture achieves universality, implying that width is the primary determinant of expressivity:

Architecture Nonlocal Ingredient Expressive Capacity
FNO (full) Full band Fourier modes Universal
ANO (minimal) Domain-average (mode 0) Universal (with large enough channel width)

A plausible implication is that, under parameter constraints, there exists a U-shaped tradeoff: at fixed parameter budget, optimal performance is achieved at an intermediate point balancing mode count and channel width, depending on the complexity of the problem (Lanthaler et al., 2023).

7. Limitations, Open Problems, and Broader Context

Current theorems provide existential guarantees on universal approximation, but do not generally furnish quantitative rates for channel width, modes, or depth as a function of error tolerance in the general case (Lanthaler et al., 2023). Notable open questions include:

Practical limitations include computational efficiency on general geometries (where FFT is not directly available) and the fixed nature of the Fourier basis (which can be suboptimal for specific problem classes) (Kovachki et al., 2021).

In summary, the universal approximation theorem for FNOs rigorously characterizes the ability of FNO architectures to approximate broad classes of continuous and differentiable operators between infinite-dimensional spaces, with precise connections to linear measurement, functional analysis, and neural network theory. These results underpin the foundational role of FNOs in scientific machine learning, operator learning, and computational PDEs (Krylov et al., 3 Feb 2026, Kovachki et al., 2021, Yao et al., 16 Dec 2025, Lanthaler et al., 2023, Nguyen et al., 16 Jul 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Universal Approximation Theorem for FNOs.