Universal Approximation for FNOs

Updated 10 February 2026

FNOs are defined as architectures combining linear measurements in the Fourier domain with nonlinear transformations to approximate operators with arbitrary accuracy.
The theorem provides guarantees for approximating both scalar-valued and Banach-space-valued operators on compact subsets of Hilbert spaces.
Proof techniques such as finite-dimensional proxies, partitions of unity, and bandwidth truncation enable efficient FNO implementations for solving PDEs.

A universal approximation theorem for Fourier Neural Operators (FNOs) provides rigorous guarantees that FNO architectures can approximate any continuous operator between function spaces to arbitrary accuracy, under suitable conditions. FNOs leverage linear measurements in the Fourier domain combined with nonlinear transformations, mirroring the structure common in neural operator learning. This theorem formalizes the universality of FNOs not only for scalar-valued functionals but also for Banach-space-valued operators on compact subsets of Hilbert spaces, and underpins the efficiency and practical relevance of FNO-based operator learning frameworks.

1. Foundations: Operator Approximation via Linear Measurements and Nonlinearities

Let $H$ be a real Hilbert space (e.g., $H = L^2(\Omega)$ or a Sobolev space), and $K \subset H$ a compact subset. In applications, input domains may be $H^n$ for $n$ channels. A general and unifying approximation scheme is as follows:

Linear Measurement System: For some $m \in \mathbb{N}$ , take a tuple of continuous linear functionals $L = (L_1, \ldots, L_m)$ , each $L_j: H^n \to \mathbb{R}$ , which admit representations by Riesz's theorem $L_j(x) = \sum_{i=1}^n \varphi_{ji}(x_i)$ with $\varphi_{ji} \in H^*$ .
Two-Stage Model: Construct an approximator of the form

$f(x) = \Phi(L_1(x), \ldots, L_m(x)),$

where $\Phi: \mathbb{R}^m \to Y$ is a continuous nonlinear map ( $Y$ a Banach space).

Scalar-Valued Universal Approximation: For any $F \in C(K; \mathbb{R})$ and $\epsilon > 0$ , there exist $m$ , $L_j$ , and $\Phi$ such that

$\sup_{x \in K} |F(x) - \Phi(L_1(x),\ldots,L_m(x))| < \epsilon.$

Banach-Valued Universal Approximation (Finite-Rank): For $F \in C(K; Y)$ , for any $\epsilon > 0$ , there exist $m$ , $L_j$ , $y_1,\ldots,y_r \in Y$ , and scalar nonlinearities $\zeta_j: \mathbb{R} \to \mathbb{R}$ so that

$G(x) = \sum_{j=1}^r y_j \cdot \zeta_j(L_j(x)), \quad \sup_{x \in K} \|F(x) - G(x)\|_Y < \epsilon.$

These structures abstract the "measure, apply nonlinearity, combine" pattern prominent in operator learning (Krylov et al., 3 Feb 2026).

2. Universality of FNOs: Theoretical Guarantees

FNOs are defined over Sobolev or Hölder spaces (e.g., $H^s(T^d;\mathbb{R}^{d_a})$ ) and map between infinite-dimensional function spaces. The FNO architecture consists of:

Lifting: $R: H^s \to H^s$ maps input to higher-dimensional embedding.
Fourier Layers: At each depth, apply

$L_\ell(v)(x) = \sigma\left(W_\ell v(x) + b_\ell(x) + \mathcal{F}^{-1}[P_\ell(k) \mathcal{F}(v)(k)](x)\right),$

where $W_\ell$ and $P_\ell$ are learned, and $\sigma$ is a Lipschitz, nonpolynomial activation.

Projection: $Q$ maps back to output space.

The universal approximation theorem for FNOs states (Kovachki et al., 2021, Yao et al., 16 Dec 2025, Lanthaler et al., 2023, Krylov et al., 3 Feb 2026): For any continuous operator $G: H^s \to H^{s'}$ (or Banach-valued $G: X \to Y$ ), and any compact $K \subset H^s$ , for every $\epsilon > 0$ , there exists a finite-size FNO—characterized by mode cutoff $N$ , layer depth $L$ , and width $d_v$ —such that

$\sup_{a \in K} \|G(a) - N(a)\|_{H^{s'}} < \epsilon.$

This result holds even if the nonlocal part (Fourier multipliers) is restricted to finitely many modes (Krylov et al., 3 Feb 2026, Lanthaler et al., 2023).

For differentiable operators $G \in C^1$ , the universal approximation extends to joint approximation of $G$ and its derivative $DG$ in operator norm, both uniformly on compact sets (with activations in a suitable smooth class) (Yao et al., 16 Dec 2025).

3. Proof Techniques and Structural Insights

The main proof approaches combine functional analytic and neural network approximation principles:

Finite-Dimensional Proxy: Uniform continuity on compact $K$ enables reduction to finite-dimensional subspaces spanned by sample points, via finite $\delta$ -nets and projection.
Partition of Unity: Construct continuous partitions on finite-dimensional projections to interpolate the target operator.
Bandwidth Truncation: Project input and output to finite Fourier (or basis) modes, yielding effective reduction to neural networks on finite-dimensional Euclidean spaces.
Approximation via Classical Theorems: Apply the Stone-Weierstrass theorem and classical neural network universality for finite-dimensional maps to build the nonlinear stage.
Lifting and Extension: Map approximators back to the original infinite-dimensional context via Riesz extensions of linear measurement functionals.
Specialization to FNOs: Truncate to $m$ Fourier modes, apply nonlinear transformations per channel, then aggregate—realizing the universality hypotheses in FNO structure (Krylov et al., 3 Feb 2026, Kovachki et al., 2021, Lanthaler et al., 2023).

For Banach-valued operators, finite-rank decompositions with scalar nonlinearities and vector-valued weights yield the required operator structure (Krylov et al., 3 Feb 2026). The techniques generalize from Sobolev to Hölder and Lebesgue scales, and extend to both periodic and general Lipschitz domains via appropriate basis functions or nonlocal operators (Lanthaler et al., 2023).

4. Efficiency, Practical Implications, and Complexity

Universality does not guarantee efficiency, but explicit complexity bounds have been established for prototypical PDE operators:

PDE Operators (Darcy, Navier–Stokes): Fourier-Galerkin discretization enables convergent finite-mode representations, and FNO constructions match or surpass classical pseudo-spectral solvers in complexity.
- For elliptic PDEs: To achieve error $\epsilon$ , network size scales sublinearly or sublogarithmically in $1/\epsilon$ under regularity assumptions (e.g., size $\sim \epsilon^{-d/k} \log(1/\epsilon)$ for $k>d$ ) (Kovachki et al., 2021).
- For time-dependent PDEs (Navier–Stokes): With second-order time discretizations, nearly linear or sublinear scaling in $1/\epsilon$ is attained for large smoothness $r$ (e.g., size $\lesssim \epsilon^{-(1+\frac{d}{r})} \log(1/\epsilon)$ ) (Kovachki et al., 2021).
Micromechanics: For cell problems in homogenization, explicit FNO architectures can approximate the solution operator to arbitrary accuracy with complexity matching FFT-based solvers, independent of materials symmetry or phase geometry, subject only to material contrast constraints (Nguyen et al., 16 Jul 2025).
Channels vs Modes: Universality can be achieved by holding the number of Fourier modes fixed (even zero, i.e., using only domain-average nonlocality) and letting channel width grow, suggesting width is the key capacity parameter rather than mode count (Lanthaler et al., 2023).

Practical Constraints: FFT-based FNOs require periodic or regular domains; general domains may need learned continuation or adapted basis functions (Kovachki et al., 2021, Lanthaler et al., 2023). Fixed-sine/cosine trunk basis distinguishes FNOs from DeepONets, which can in principle learn the basis.

5. Universality Beyond Approximation: Derivative-Informed and Weighted Sobolev Settings

Derivative-informed FNOs (DIFNOs) guarantee not only function approximation but also uniform approximation of Fréchet derivatives for $C^1$ operators, both on compact sets and in $L^2_\mu$ -weighted Sobolev spaces with unbounded input support (Yao et al., 16 Dec 2025). Specifically, for any $\epsilon > 0$ , there exists an FNO $N$ such that

$\sup_{a \in K} \big\{ \|G(a) - N(a)\|_Y,~\|DG(a) - DN(a)\|_{HS} \big\} < \epsilon,$

or in weighted global norms,

$\|G - N\|^2_{L^2_\mu} + \|DG - DN\|^2_{L^2_\mu} < \epsilon^2,$

where $\mu$ is a probability measure on $X = H^s(T^d;\mathbb{R}^{d_a})$ .

Implications: These results guarantee that FNO-based surrogates for PDE-constrained optimization can be made arbitrarily close, both in solution and in derivatives, ensuring meaningful surrogacy for sensitivity-driven optimization or inverse problems (Yao et al., 16 Dec 2025).

6. Minimality, Nonlocality, and Channel-Mode Tradeoffs

Recent work demonstrates that universality of operator approximation is not contingent on access to a fully resolved Fourier spectrum. Even a single nonlocal operation—such as domain averaging (zero Fourier mode)—combined with sufficient nonlinearity and width, suffices for universal operator approximation on compact sets (Lanthaler et al., 2023). The minimal "averaging neural operator" (ANO) architecture achieves universality, implying that width is the primary determinant of expressivity:

Architecture	Nonlocal Ingredient	Expressive Capacity
FNO (full)	Full band Fourier modes	Universal
ANO (minimal)	Domain-average (mode 0)	Universal (with large enough channel width)

A plausible implication is that, under parameter constraints, there exists a U-shaped tradeoff: at fixed parameter budget, optimal performance is achieved at an intermediate point balancing mode count and channel width, depending on the complexity of the problem (Lanthaler et al., 2023).

7. Limitations, Open Problems, and Broader Context

Current theorems provide existential guarantees on universal approximation, but do not generally furnish quantitative rates for channel width, modes, or depth as a function of error tolerance in the general case (Lanthaler et al., 2023). Notable open questions include:

Rigorous generalization error bounds under stochastic input distributions (Kovachki et al., 2021).
Extension to non-periodic or unstructured domains via learned continuation or restriction (Kovachki et al., 2021).
Adaptivity in mode selection, including data-driven cutoff choices (Kovachki et al., 2021).
Universality for other nonlocal neural operator architectures (e.g., graph kernels, wavelets) (Lanthaler et al., 2023).
Model-selection theory for optimal tradeoffs of channel width, depth, and nonlocality (Lanthaler et al., 2023).

Practical limitations include computational efficiency on general geometries (where FFT is not directly available) and the fixed nature of the Fourier basis (which can be suboptimal for specific problem classes) (Kovachki et al., 2021).

In summary, the universal approximation theorem for FNOs rigorously characterizes the ability of FNO architectures to approximate broad classes of continuous and differentiable operators between infinite-dimensional spaces, with precise connections to linear measurement, functional analysis, and neural network theory. These results underpin the foundational role of FNOs in scientific machine learning, operator learning, and computational PDEs (Krylov et al., 3 Feb 2026, Kovachki et al., 2021, Yao et al., 16 Dec 2025, Lanthaler et al., 2023, Nguyen et al., 16 Jul 2025).