Regression-Based Normalisation Techniques

Updated 22 January 2026

Regression-based normalisation is a method that leverages regression models to parameterize data transformations and adaptive scaling based on feature predictive utility.
Key variants like pNML, Adaptive Scaling, and SUQUAN optimize normalization by linking model parameters with scaling, quantile mapping, and uncertainty adjustments.
These techniques enhance prediction reliability and fairness, with applications spanning signal processing, calibration of prediction intervals, and socio-technical systems.

Regression-based normalisation refers to a family of techniques in which regression modeling is leveraged to parameterize or directly define normalisation transformations in data preprocessing, prediction uncertainty quantification, fairness enhancement, signal trend extraction, and calibration. This paradigm connects the normalisation process to the regression task, often yielding adaptive, data-driven corrections suited to the specific structure or inferential objectives of the problem. The following sections detail foundational methods, algorithmic frameworks, theory, and empirical evidence across the current landscape of regression-based normalisation.

1. Foundations of Regression-Based Normalisation

Regression-based normalisation techniques contrast with classical, distribution-independent scaling (such as standardization or min–max scaling) by connecting the transformation of covariates, residuals, or targets to the underlying regression model or problem-specific objectives. Prototypical methods include:

Predictive Normalized Maximum Likelihood (pNML): In linear regression, pNML constructs a normalized predictive density by adapting model parameters for each hypothetical target value and renormalizing, with explicit connection to the geometry of the design matrix and the location of the queried test feature (Bibas et al., 2019).
Adaptive Scaling (AS): Feature scaling factors are proportional to OLS regression coefficients, effectively weighting features according to their predictive utility rather than crude moment estimates (Li et al., 2017).
Supervised Quantile Normalisation (SUQUAN): Rather than fixing a target empirical quantile distribution, the quantile function itself is optimized jointly with the regression parameters, recasting normalisation as a low-rank matrix regression (Morvan et al., 2017).
Regression-based fairness normalisation (FaiReg): In the context of group fairness, target labels are shifted and scaled groupwise to match a global distribution, penalising predictions that co-vary with “unfair” group differences (Amin et al., 2022).

Regression-based normalisation concepts also appear in spectral signal processing (e.g., continuum normalization with deep regression models (Różański et al., 2021)) and in adaptive calibration of prediction intervals by learning residual distributions conditional on features (Colombo, 2024).

2. Algorithmic Formulations and Variants

A spectrum of regression-based normalisation approaches can be classified according to their application domain and mechanism of action. The following table summarizes the principal algorithmic families, their targets, and conceptual mechanisms:

Method Family	Target of Normalisation	Mechanism
pNML (Bibas et al., 2019)	Predictive distribution (y	x)
Adaptive Scaling (Li et al., 2017)	Features	Regression-driven per-feature scaling
SUQUAN (Morvan et al., 2017)	Quantile normalisation function	Joint quantile+model optimization
FaiReg (Amin et al., 2022)	Groupwise labels	Groupwise shifting & scaling to global
SUPPNet (Różański et al., 2021)	Spectral continuum	Deep regression + smoothing spline
Excitation norm. (Glushchenko et al., 2021)	Regressor excitation	Saturated, order-of-magnitude mapping
Flow-based conformal (Colombo, 2024)	Conformity score (	y-f(x)

pNML and Local Learnability

In the pNML framework, for a test point $x$ and label $y$ , the fitted parameter $\hat\theta(y)$ is determined as

$\hat\theta(y) = \theta_N + P x (y - x^T \theta_N), \qquad P = (X_N X_N^T)^{-1},$

inducing a normalised predictive density with variance inflated locally according to the geometry of $x$ relative to the training data. The regret $\Gamma(x,D) = -\log\bigl(1 - x^T P x\bigr)$ quantifies local learnability: low when $x$ is well aligned with the principal data subspace, high (and variance inflated) otherwise. This adaptive inflation achieves a form of automatic, data-driven regularization that extends to over-parameterized models ( $M>N$ ) (Bibas et al., 2019).

Adaptive Scaling of Features

Adaptive Scaling (AS) entails:

OLS regression on mean-centered training data to obtain $\hat\beta$ .
Scaling each feature $j$ by $y$ 0, with $y$ 1 (“prior-weight exponent”), generalizing to AS, GAS, and ASH for univariate high-dimensional cases (Li et al., 2017).

By weighting features according to their estimated predictive impact (as opposed to only their variance), AS tailors subsequent penalized regression or classification to the empirical feature–response dependency structure.

Supervised Quantile Normalisation (SUQUAN)

SUQUAN jointly optimizes the target quantile vector $y$ 2 (constrained monotonic, zero-sum, bounded norm) and regression weights via: $y$ 3 equivalent to a rank-1 matrix regression $y$ 4 under permutation-matrix embeddings. Algorithms alternate convex steps in $y$ 5 and $y$ 6, using isotonic regression and block-coordinate descent (Morvan et al., 2017).

Regression-Based Normalisation for Fairness

FaiReg normalises groupwise label distributions to the global mean and variance: $y$ 7 where $y$ 8 is group membership, and $y$ 9 are global mean and standard deviation. The regression model is then trained to predict these “fair” targets. This penalizes correlation of predictions with group identity and ensures approximate statistical parity (Amin et al., 2022).

3. Theoretical Insights and Guarantees

Regression-based normalisation methods offer explicit control over bias–variance tradeoffs, conditioning, and inferential guarantees:

Adaptive Scaling: Scaling features by OLS coefficients reweights Lasso or other penalties, reducing bias on strong predictors (“group-lasso-like” penalties), but may increase estimator variance, particularly if $\hat\theta(y)$ 0 emphasizes noisy coefficients. Optimal $\hat\theta(y)$ 1 varies by task and data (Li et al., 2017).
Shrinkage and Normalisation Choices: For lasso/ridge with binary features, variance- or std-dev-scaling (by $\hat\theta(y)$ 2 or $\hat\theta(y)$ 3) eliminates class-balance bias but increases variance. Weighted penalties in the elastic net provide a compromise. Cross-validation over the scaling exponent $\hat\theta(y)$ 4 is empirically recommended (Larsson et al., 7 Jan 2025).
pNML Minimax Regret: Regret $\hat\theta(y)$ 5 precisely quantifies how much a “genie” with access to $\hat\theta(y)$ 6 could outperform the learner at $\hat\theta(y)$ 7. Learnability—and the benefit of pNML normalisation—depends strictly on the alignment of $\hat\theta(y)$ 8 with the principal eigen-space of the empirical correlation matrix (Bibas et al., 2019).
Fairness Constraints: FaiReg minimises MSE plus a covariance penalty between predictions and the unfair component $\hat\theta(y)$ 9. Group means and variances are drawn together, but loss of informative group differences may reduce accuracy if such differences are real (Amin et al., 2022).
Excitation Normalisation in Identification: Scalar normalization with saturated, order-of-magnitude mapping (rather than amplitude scaling) allows a fixed adaptation gain, decoupling convergence bounds from regressor amplitude and improving robustness across excitation levels (Glushchenko et al., 2021).

4. Applications and Empirical Performance

Feature and Target Scaling in Machine Learning

Regression-based scaling techniques (AS, GAS, ASH) generally outperform uniform scaling in variable selection, shrinkage, and scale-sensitive models (e.g., Lasso, neural nets, K-NN), as shown in synthetic and UCI default data experiments. However, in extreme variance settings, error can increase, and tree-based or Naive Bayes models (scale-invariant) show no benefit (Li et al., 2017).

Spectral and Astronomical Data

SUPPNet employs deep regression to estimate the pseudo-continuum in high-resolution stellar spectra, followed by spline-based smoothing. This yields automatic normalization with RMS error competitive with manual expert fits, and generalizes to other trend-removal contexts by retraining the regression/spline post-processor (Różański et al., 2021).

Fairness in Socio-technical Systems

FaiReg demonstrably eliminates groupwise prediction bias in personality/interview core regression by normalizing target distributions prior to learning. This achieves statistical parity and substantial reduction in prediction-group correlation, with only marginal loss in overall mean absolute accuracy, outperforming both weighted rebalancing and adversarial de-biasing (Amin et al., 2022).

Prediction Interval Calibration

Normalizing-flow–based calibration learns an input-dependent (A,X)-monotonic transformation of conformity scores, yielding locally adaptive, valid prediction intervals. The associated maximum-likelihood training objective guarantees minimization of the empirical conditional validity gap as the flow improves (Colombo, 2024).

System Identification and Adaptive Control

Regressor excitation normalization in DREM ensures adaptive parameter identification with excitation-independent error bounds and eliminates the need for excitation-dependent gain scheduling. Standard amplitude normalization cannot guarantee these uniform properties (Glushchenko et al., 2021).

5. Limitations, Open Issues, and Extensions

Core challenges for regression-based normalisation include:

Model Misspecification: Methods that use OLS-based scaling or label normalization (e.g., AS, FaiReg) may suffer if linearity or Gaussianity assumptions are violated, or if predictive structure is highly nonlinear or non-additive (Li et al., 2017, Amin et al., 2022).
Variance Inflation: Bias reduction by aggressive regression-based normalisation (as in Lasso with $\hat\theta(y) = \theta_N + P x (y - x^T \theta_N), \qquad P = (X_N X_N^T)^{-1},$ 0) almost inevitably increases estimator variance in imbalanced or high-dimensional settings (Larsson et al., 7 Jan 2025).
Convexity and Optimality: Joint optimization problems (as in SUQUAN) are non-convex due to the rank constraint, limiting guarantees to stationary points; convex relaxations or alternative criteria may be warranted (Morvan et al., 2017).
Residual Bias and Alignment: For fairness objectives, label normalization cannot correct feature bias or confounding; groupwise differences beyond mean–variance are not addressed (Amin et al., 2022).
Transferability: Regression-based normalisation requires task- or distribution-specific design; e.g., adaptive scaling factors or trained flows may need retraining under shift or domain adaptation scenarios (Li et al., 2017, Colombo, 2024).
Hyperparameter Selection: Tuning exponents ( $\hat\theta(y) = \theta_N + P x (y - x^T \theta_N), \qquad P = (X_N X_N^T)^{-1},$ 1), model complexity in deep regressors, or smoothing factors in spline post-processing remains a data-dependent process, with cross-validation as the pragmatic approach (Li et al., 2017, Larsson et al., 7 Jan 2025).

6. Outlook and Future Directions

Research continues into more sophisticated, context-aware regression-based normalisation strategies. Opportunities include:

Nonlinear and Kernel Extensions: Expanding beyond linear OLS coefficients to kernel or deep models for scaling transformations in highly nonlinear tasks (Li et al., 2017).
Higher-Moment Group Normalisation: In group-fairness contexts, normalizations based on moments beyond mean/variance, or adversarially learned adjustments, are under study (Amin et al., 2022).
Data-driven calibration architectures: Flow-based adaptation and other flexible parametric transformations are expected to see broader deployments in uncertainty quantification, highlighting a modular, plug-in approach to calibration and normalisation (Colombo, 2024).
Noise-robust and finite-time identification: Adapting excitation-normalisation mappings to noise-robust or hybrid estimation schemes in adaptive systems remains an open avenue (Glushchenko et al., 2021).
Complex interactions and mixed-type features: Theoretical and practical guidelines for scaling in the presence of arbitrary interactions or mixed data types are being refined, with weighted penalties representing a promising direction (Larsson et al., 7 Jan 2025).

In summary, regression-based normalisation unifies model-centric, data-driven correction with task-specific, theory-grounded regularization and calibrability. Its variants are demonstrating improved empirical performance, interpretability, and fairness across a range of contemporary regression, signal-processing, and machine learning challenges.