Non-Linearity Score Term Analysis

Updated 11 January 2026

Non-Linearity Score Term is a quantitative metric that measures how much a model deviates from linear behavior using methods like optimal transport and Walsh-Hadamard analysis.
It is applied in deep neural networks to evaluate layer-wise expressivity by comparing actual mappings to affine approximations, thereby clustering architectures by performance.
The metric also informs cryptographic security and statistical model selection by balancing model complexity with accurate non-linear effect quantification.

A non-linearity score term is a quantitative metric, function, or statistic that measures the degree or structure of non-linearity in a mathematical object, statistical model, function, or system. Such terms are central in quantifying how much a system deviates from linear behavior—a property important in areas spanning statistical estimation, cryptography, @@@@1@@@@, statistical physics, econometrics, information theory, and signal processing. Depending on the context, non-linearity score terms are constructed and interpreted through diverse theoretical frameworks, including optimal transport, function approximation theory, statistical hypothesis testing, and combinatorial or harmonic analysis. The sections below survey prominent forms and uses of non-linearity score terms across representative domains.

1. Non-Linearity Scores in Deep Neural Networks

Deep learning research has introduced non-linearity score terms to analyze expressivity and architectural behavior in neural networks. In "From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport" (Bouniot et al., 2023), the affinity score $\rho_{\mathrm{aff}}(X, Y)$ formalizes the deviation of a layer's input-output pair from the space of affine maps. For input $X$ and activation output $Y$ (usually after pooling), the score is

$\rho_\mathrm{aff}(X, Y) = 1 - \frac{W_2(T_{\mathrm{aff}}(X), Y)}{\sqrt{2}\,\mathrm{Tr}[\Sigma(Y)]^{1/2}}$

where $W_2(\cdot,\cdot)$ is the 2-Wasserstein (quadratic optimal transport) distance, $T_{\mathrm{aff}}$ is the best affine map (in the OT sense) matching the Gaussian approximations of $X$ and $Y$ , and $\Sigma(Y)$ is the covariance matrix. $\rho_\mathrm{aff} = 1$ for exactly affine transformations and decreases for highly non-linear layers.

By stacking $\rho_{\mathrm{aff}}$ scores per layer, one obtains a layer-wise non-linearity signature—a vector which serves as a fingerprint of architectural and training choices. The paper demonstrates that statistical aggregates (mean, standard deviation, etc.) of these scores correlate with performance and cluster DNN architectures into functionally meaningful classes.

2. Non-Linearity Scores in Boolean Function Analysis

Non-linearity score terms are foundational in theoretical computer science and cryptography. The classical non-linearity of a Boolean function $f: \{0,1\}^n \to \{0,1\}$ is its minimum Hamming distance to the set of affine functions: $\mathrm{NL}(f) = 2^{n-1} - \frac12 \max_{s \in \{0,1\}^n} |W_f(s)|$ where $W_f(s)$ is the (unnormalized) Walsh-Hadamard transform. This score quantifies how well $f$ can be approximated linearly; high non-linearity is desirable in cryptographic primitives. Estimation of $\mathrm{NL}(f)$ for large $n$ is computationally intensive, spurring both classical (randomized) and quantum algorithms for approximate non-linearity estimation with complexity bounds expressed in terms of the allowed estimation error (Bera et al., 2021).

Extensions to multidimensional non-linearity parameters $(N_f^{(r)}, H_f^{(r)})$ (Semaev, 2019) generalize the single-parameter view, measuring the degree to which $f$ 's behavior departs from all $r$ -bit linear projections by evaluating induced empirical distributions over $\mathbb{F}_2^r$ and quantifying sparsity/entropy in that space—a theoretical foundation for analyzing cryptanalytic resilience.

3. Non-Linearity Score Terms in Parametric and Statistical Models

Many statistical methods introduce non-linearity score terms as components of score vectors or test statistics. In generalized regression models with non-linear predictors, the score vector (gradient of the log-likelihood) typically contains non-linearity terms that reflect derivatives through non-linear mappings. For example, in nonlinear simplex regression (Espinheira et al., 2018), the score component for mean parameters involves

$U_\beta = \widetilde X^\top S U T (y - \mu)$

where $\widetilde X$ is the Jacobian matrix of the non-linear predictor with respect to coefficients, embedding the non-linear dependence.

Similarly, in nonlinear network autoregression (Armillotta et al., 2022), the nonlinearity score appears as the subvector of the quasi-score (profile score under the null hypothesis of linearity), specifically: $S_{NT}^{(2)}(\tilde\theta) = \sum_{t=1}^T h_t(\tilde\gamma)' D_t^{-1} (Y_t - \lambda_t(\tilde\beta))$ which serves as the focal statistic in score-based hypothesis testing for the presence of non-linearity.

4. Non-Linearity Score Terms in Stochastic Process and Diffusion Models

In generative modeling and stochastic differential equations, score functions (gradients of log-densities) encode probabilistic flows. For nonlinear SDEs, the score is not simply a linear function but decomposes into a Gaussian-like term and a genuine nonlinear correction—an explicit non-linearity score term. In the Malliavin calculus framework (Mirafzali et al., 21 Mar 2025), the decomposition reads: $\nabla \log p_t(x) = A(x, t) + B(x, t)$ where $A(x, t)$ is the affine (typically Gaussian) term and $B(x, t)$ represents the non-linear correction arising from second-order variations. $B(x, t)$ vanishes for linear SDEs, but its presence is indispensable for modeling non-Gaussian marginals and behavior in nonlinear systems.

Score-based sampling and error analysis in nonlinear Lévy--Fokker--Planck equations require computation and error control of analogous (possibly nonlocal) non-linearity score terms to capture the influence of jumps and discontinuities (Huang et al., 2024).

5. Non-Linearity Score in Symbolic Regression and Implementation Complexity

Recent work exploits non-linearity score terms as objective components in function and model selection, especially for hardware efficiency and interpretability. In KAN-based autoencoders, the non-linearity score $S[f]$ is defined as the minimal number of piecewise linear segments needed to approximate a function $f$ within a fixed mean-squared error threshold (Perre et al., 4 Jan 2026): $S[f] = \min \left\{ N : E(N) < \varepsilon \right\}$ where $E(N)$ is the total squared error of the piecewise-linear fit. Higher $S[f]$ implies greater non-linearity; penalizing $S[f]$ during symbolic regression shapes the search toward functionally simpler, lower-complexity representations subject to accuracy constraints, directly impacting energy consumption and hardware implementation cost.

6. Non-Linearity Score Terms in Physical and Data-Driven Networks

Non-linear scoring rules pervade network science, where the node-ranking metrics inherently involve non-linear iterative couplings. The fitness–complexity metric (Wu et al., 2016) for bipartite networks defines country and product scores as the fixed point of coupled recurrences: $\tilde{F}_i^{(n)} = \sum_\alpha \mathcal{M}_{i\alpha} Q_\alpha^{(n-1)},\quad \tilde{Q}_\alpha^{(n)} = \frac{1}{\sum_i \mathcal{M}_{i\alpha} \frac{1}{F_i^{(n-1)}}}$ The non-linear harmonic mean operation on the exporter fitnesses implements an implicit non-linearity score that controls convergence and sensitivity and encodes crucial structural information about nestedness and extremal roles.

7. Interpretation, Calibration, and Use Across Domains

A non-linearity score term's behavior and interpretation are context-dependent:

In DNNs, layerwise non-linearity scores guide architecture optimization and interpretability.
In cryptography, non-linearity scores directly determine resistance to linear and correlation attacks.
In regression and stochastic modeling, non-linearity scores enable detection or quantification of non-linear effects and corrections.
In symbolic model selection, non-linearity scores balance accuracy with implementation cost.
In physical systems, non-linearity scores may be used to probe departures from classical behavior or to test for new physics (e.g., King non-linearity in isotope shift experiments) (Assi et al., 2 Dec 2025).

Calibration methods (e.g., normalization, root-finding for parameter fits) depend on the underlying geometry of the problem and tolerance to errors. Integrative frameworks often combine non-linearity score terms with other objective components to achieve balanced model selection or hypothesis testing.

In sum, non-linearity score terms are indispensable quantitative tools for analyzing, designing, interpreting, and implementing mathematical, statistical, and algorithmic structures in domains where linear approximations fail to capture complexity or where explicit quantification of non-linear effects is necessary for theoretical, practical, or security reasons.