Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nonlinear Approximation Characteristics

Updated 10 January 2026
  • Nonlinear approximation is a method for approximating functions using adaptive, data-driven selections from flexible dictionaries that yield superior accuracy compared to linear methods.
  • It leverages compositions and deep architectures, such as ReLU networks, to achieve exponential or rate-doubled convergence under certain smoothness assumptions.
  • Applications include efficient high-dimensional PDE solvers, stable data assimilation, and compressive sensing where stability and optimal error decay are crucial.

Nonlinear approximation characteristics encompass the theory, methodologies, and performance metrics associated with approximating functions or operators via mappings that are nonlinear in their parameters, selection, or construction. In contrast to classical (linear) approximation, which utilizes fixed subspaces or bases, nonlinear approximation exploits the flexibility of adaptive or data-driven choices, compositions, and other nonlinear mechanisms to achieve superior accuracy for a wider range of target functions and under broader model assumptions.

1. Definitions and Metrics in Nonlinear Approximation

A central framework is the best NN-term nonlinear approximation, where, for a function ff and a "dictionary" D\mathcal D of admissible functions, the goal is to approximate ff as a linear combination of NN elements optimally chosen from D\mathcal D:

εL,f(N):=min{gn}R,{Tn}Df(x)n=1NgnTn(x)\varepsilon_{L,f}(N):=\min_{\{g_n\}\subset\mathbb{R},\,\{T_n\}\subset\mathcal{D}} \left\|f(x) - \sum_{n=1}^N g_n T_n(x)\right\|

The metric of interest is how εL,f(N)\varepsilon_{L,f}(N) decays as NN increases, reflecting the approximation efficiency. Typical choices of D\mathcal{D} include bases or frames (e.g., wavelets, splines, kernels, neural network parameterizations), and the "nonlinear" aspect refers to optimizing the selection of these ff0 terms for each target ff1, rather than committing to a fixed collection (Shen et al., 2019).

For multivariate and tensor-product settings, similar constructs arise (e.g., best ff2-term tensor product approximations (Bazarkhanov et al., 2014)).

2. Fundamental Theorem Classes and Rate Results

2.1 Classical and Kernels-Based Best ff3-Term Rates

For many canonical function spaces (e.g., Sobolev, Triebel–Lizorkin, Besov), best ff4-term approximation with suitably regular kernel families, wavelets, or splines satisfies (Hamm et al., 2016):

ff5

where ff6 of ff7 kernel terms, ff8 is the smoothness parameter, and ff9 the ambient dimension.

2.2 Composition and Deep Learning Regimes

When dictionary elements are compositions—especially, e.g., compositional neural networks with D\mathcal D0 layers—depth can dramatically accelerate best D\mathcal D1-term rates:

  • Depth-1 (shallow): D\mathcal D2
  • Depth-2: D\mathcal D3, exponent doubles
  • Depth-3 and higher: For Hölder D\mathcal D4 on D\mathcal D5, D\mathcal D6 is optimal; extra layers (D\mathcal D7) give no further gain (Shen et al., 2019).

For one-dimensional D\mathcal D8 with only continuity (not smoothness), doubling of rate at D\mathcal D9 still occurs.

2.3 Superiority of Deep ReLU Networks

Certain function classes exhibit exponential Shannon-type rates for deep ReLU networks; Takagi-type and self-similar functions satisfy ff0, while best spline or wavelet dictionaries deteriorate to polynomial rates (Daubechies et al., 2019).

2.4 Nonlinear Tensor Product Approximation

For ff1 in periodic mixed-smoothness class ff2 on ff3 variables:

ff4

Upper and lower bounds match up to logarithmic factors; constructive greedy algorithms attain similar exponents (Bazarkhanov et al., 2014).

2.5 Restricted and Weighted Approximation

The introduction of general measures ff5 for restricted ff6-term approximation leads to a full characterization of approximation spaces using weighted Lorentz sequence spaces and the upper/lower Temlyakov property, unifying Jackson and Bernstein inequalities with approximation embeddings (Hernández et al., 2011).

3. Methodological Frameworks

3.1 Compositional Dictionaries

Composition of shallow networks or blocks leads to improved expressivity. The optimal rate-doubling ("L=2") and rate saturation ("L=3") phenomena are both tied to the combinatorial growth of the function landscape under composition and tiling: e.g., in ff7 dimensions, tiling requires ff8 cubes, each attaining ff9 local error for Hölder NN0 (Shen et al., 2019).

3.2 Kernel and Approximation Families

Regular families of decaying or growing kernels—encompassing Gaussians, multiquadrics, cardinal functions—can be systematically analyzed for their NN1-term rates by verifying translation, dilation, and Poisson summation properties (hypotheses (A1)-(A6)) (Hamm et al., 2016). These kernels enable nonlinear spaces that match the performance of best wavelet expansions.

3.3 Greedy and Library-Based Schemes

For high-dimensional parametric PDEs and analytic function classes with anisotropy, adaptive library-based piecewise Taylor approximations subdivide the parameter space and select local low-dimensional spaces for each cell, achieving quantifiable error rates depending logarithmically or subexponentially on the error tolerance, breaking or mitigating the curse of dimensionality, depending on the anisotropy decay (Guignard et al., 2022).

3.4 Choquet and Nonlinear Integral Operators

Nonlinear extension of classical constructive schemes via the Choquet integral leads to Bernstein–Choquet and Picard–Choquet operators. These exhibit improved rates for certain function classes (monotone/concave or exponentials), outperforming classical linear positive operators in those regimes (Gal, 2014).

3.5 Algorithmic and Computational Aspects

Many nonlinear approximation methods, particularly those involving nonconvex selection or parameter search (e.g., kernel parameter grids, nonnegative least squares for rational/exponential approximations (Vabishchevich, 2023)), employ iterative or greedy algorithms. Effective discretization, active-set NNLS, and QR-based stabilization are standard techniques; provable convergence properties may be lacking in fully nonlinear parameter regimes.

4. Stability, Manifold Widths, and Optimality

Realistic nonlinear approximation must account for numerical stability, most prominently captured by the notion of stable manifold widths NN2. These widths are intimately connected to the entropy numbers NN3 measuring the compactness of NN4. Fundamental consequences (Cohen et al., 2020):

  • For Hilbert spaces, stable widths and entropy numbers are equivalent up to constants.
  • In Banach spaces, enforcing NN5-Lipschitz continuity in encoder/decoder bounds the possible approximation rates by entropy—precluding "faster-than-entropy" rates.
  • For unit Lipschitz-bounded function classes (e.g., LipNN6([0,1])), enforcing stability forces O(NN7) error decay, even as unstable approximations (deep ReLU nets with arbitrary parameterization) can obtain O(NN8).

5. Specialized and Emerging Regimes

5.1 Quadratic and Algebraic Manifolds

The quadratic formula–based degree-2 nonlinear approximation constructs closed-form smooth coefficient manifolds to represent single-variable functions as roots of degree-2 polynomials with a learned index function for branch selection. This yields global exponential convergence across discontinuities (unlike linear/rational schemes), as the algebraic variety encodes jumps sharply and enables effective edge-preserving denoising (He et al., 6 Dec 2025).

5.2 Piecewise-Affine and Cut-Based Schemes

Multi-dimensional nonlinear functions can be efficiently approximated by iteratively partitioning the domain using hinging hyperplanes and fitting local affine surrogates (PWA). Adaptive cut-selection, continuity enforcement, and region complexity increase only as needed, attaining accuracy with many fewer regions compared to mesh-recursive baselines (Gharavi et al., 2024).

5.3 Recurrent and Sequence Models

Nonlinear RNN approximation is fundamentally limited by a Bernstein-type inverse theorem: stably approximable sequence-to-sequence maps must have exponentially decaying memory kernels, generalizing the "curse of memory" from linear to nonlinear architectures. Overcoming this requires Hurwitz-parmeterized recurrent matrices to stably represent slow memory decay (Wang et al., 2023).

6. Practical Implications and Applications

  • For Hölder or Sobolev targets on NN9, compositional deep networks with moderate width and depth achieve best-known nonlinear rates, and extra depth offers no further gain (Shen et al., 2019).
  • In kernel regimes, nonlinear D\mathcal D0-term kernel spaces achieve wavelet-optimal rates, and cardinal interpolation yields powerful greedy truncated or adaptive sampling-based approximations.
  • Library-based partitioning enables scalable surrogates for high-dimensional parametric models (PDEs, uncertainty quantification), with complexity scaling dictated by analytic anisotropy parameters (Guignard et al., 2022).
  • For models requiring stability (data assimilation, numerical PDEs, compressed sensing), achievable rates must be benchmarked via entropy or stable manifold widths, not by the raw performance of unconstrained parametrizations (Cohen et al., 2020).

7. Open Problems and Outlook

Open questions include the development of optimal or near-minimal algorithms for coefficient construction in nonlinear/algebraic manifold representations, understanding the precise role of Lipschitz stability across architectures, effective index/function encoding in high-dimensional or multi-valued contexts, and rigorous convergence guarantees for adaptive piecewise or greedy parameter selection schemes. The theory continues to evolve with advances in neural and kernel architectures, high-dimensional surrogate modeling, and algorithmic stability under data and parameter perturbations.


References by arXiv id:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonlinear Approximation Characteristics.