Adversarial Robustness in Regression Models

Updated 28 January 2026

Adversarial robustness in regression is the study of models under worst-case, targeted perturbations to inputs and responses, emphasizing accuracy-robustness trade-offs and certified guarantees.
It leverages minimax theory, convex regularization, and game-theoretic frameworks to balance standard risk with adversarial disruptions in parameter recovery and prediction quality.
Practical approaches include adversarial min–max training, early stopping, and robust online algorithms that maintain performance in high-dimensional, overparameterized settings.

Adversarial robustness in regression refers to the theoretical, algorithmic, and empirical study of regression estimators and prediction models under worst-case, targeted perturbations to inputs, responses, or the training process. Unlike classical robustness that addresses random, possibly heavy-tailed or outlier contamination, adversarial robustness focuses on targeted, often norm-bounded or adaptive attacks calibrated to maximally disrupt prediction quality, recovery of ground-truth parameters, or fairness. The resulting theory spans minimax trade-offs, convex and nonconvex regularization, certified guarantees, PAC learnability, game-theoretic equilibrium, streaming algorithms, Bayesian inference, and modern deep neural network behavior.

1. Formal Definitions and Risk Frameworks

Let $(x, y) \in \mathbb{R}^d \times \mathbb{R}$ denote the feature and response; a regression model is specified by a predictor $f(x)$ or parameter vector $w$ in linear models. The standard (population) risk is

$R(f) = \mathbb{E}_{(x,y)\sim P}[\ell(f(x), y)],$

with $\ell$ typically squared loss $\ell_{\rm LS}(u,v) = \tfrac12(u-v)^2$ . The $\epsilon$ -adversarial risk under norm $\|\cdot\|$ is

$R_\epsilon(f) = \mathbb{E}_{(x,y)} \Big[ \sup_{\|\Delta\|\le \epsilon} \ell(f(x+\Delta), y) \Big].$

A central object of study is the trade-off curve (Pareto frontier) between $R(f)$ and $R_\epsilon(f)$ as a function of adversarial strength $\epsilon$ , model class, and structural constraints. The effect of adversarial training, min-max estimators, and adaptive online procedures is fundamental. In addition, threats may target inputs, responses, particular samples (poisoning), or fairness metrics.

2. Fundamental Accuracy–Robustness Trade-offs

Adversarial robustness in regression is governed by a suite of minimax lower bounds. A general abstract result (Bahmani, 2024) states: $R(f) + R_\epsilon(f) \geq \max\Big\{ \mathbb{E}[\sup_{\|\Delta\|\leq \epsilon} B(f(X), f(X+\Delta))],\, \mathbb{E}[B(Y,Y')] \Big\}$ for suitable $B$ . In least-squares, with $B(u,v) = \frac16 (u-v)^2$ , this yields: $R(f) + R_\epsilon(f) \geq \frac16 \max\left\{ L_\epsilon(f),\;\mathbb{E}[(Y-Y')^2] \right\}$ where

$L_\epsilon(f) = \mathbb{E}\left[\sup_{\|\Delta\| \leq \epsilon} |f(X+\Delta) - f(X)|^2\right]$

is the mean local smoothness. Thus, any estimator attaining near-optimal standard risk must be “locally flat” (small $L_\epsilon(f)$ ) in the adversarial norm, else its adversarial risk grows.

This trade-off is controlled by a Poincaré-type constant, encoding the alignment between the data distribution $\Sigma$ and the adversarial geometry: $\lambda_* = \sup_{\theta \neq 0} \frac{\|\theta\|_{\Sigma}^2}{\|\theta\|_*^2},$ where $\|\cdot\|_\Sigma$ is the covariance-induced norm, $\|\cdot\|_*$ is the dual to the adversary’s norm. For high-dimensional $\ell_\infty$ -attacks, $\lambda_*\sim 1/d$ , making the trade-off especially stringent: non-trivial robustness is possible only for very small radii $\epsilon \ll d^{-1/2}$ or low signal-to-noise ratio (Bahmani, 2024, Dohmatob et al., 2023).

This accuracy–robustness boundary manifests as explicit thresholds. For instance, for polynomial ridge regression, the necessary condition for robustness without sacrificing accuracy is

$\epsilon \ll \sqrt{\lambda_* / \mathrm{SNR}_p}$

where $\mathrm{SNR}_p$ denotes a signal-to-noise ratio for the regression problem (Bahmani, 2024).

3. Minimax, Algorithmic, and Statistical Characterizations

Linear Models and Explicit Solutions

For linear regression with $\ell_2$ -norm attacks and $x \sim N(0,\Sigma)$ , adversarial risk decomposes as (Xing et al., 2020, Dohmatob et al., 2023): $R_0(\theta,\delta) = \|\theta-\theta_0\|_\Sigma^2 + 2\delta c_0 \|\theta\|\|\theta-\theta_0\|_\Sigma + \delta^2\|\theta\|^2$ with $c_0 = \mathbb{E}|N(0,1)|$ . The minimizer is a ridge-regularized shrinkage: $\theta^*(\delta) = (\Sigma + \lambda^*(\delta) I)^{-1} \Sigma \theta_0,$ where $\lambda^*(\delta)$ solves an implicit equation depending on $\delta$ . This explicit link shows that adversarial robustness in linear regression naturally induces $\ell_2$ or $\ell_1$ regularization, depending on the attack norm (Ribeiro et al., 2022, Xie et al., 2024).

High-dimensional and Sparse Regimes

In sparse high-dimensional linear regression ( $p>n, s = \|\beta^*\|_0 \ll p$ ), adversarial training under $\ell_\infty$ perturbations admits a convex dual form: $\min_\beta \frac{1}{n} \sum_{i=1}^n \Bigl(\big|x_i^T\beta-y_i\big| + \varepsilon \|\beta\|_1\Big)^2$ and under restricted eigenvalue conditions, matches the classical $O(s \log p / n)$ minimax rates up to log factors (Xie et al., 2024, Ribeiro et al., 2022).

Group-structured adversarial training further narrows bounds when the signal exhibits group-sparsity. The group penalty,

$\sum_{l=1}^L w_l^{-1} \|\beta^l\|_2,$

yields reduced upper bounds whenever nonzeros cluster in few groups (Xie et al., 2024).

Overparameterization and Nonparametric Settings

Adversarial training in overparameterized regimes ( $p>n$ ) causes sharp phase-transitions: for disturbances below a threshold $\epsilon < \bar\epsilon$ , the solution jumps discontinuously to the minimum-norm interpolator (Ribeiro et al., 2022). In nonparametric regression, perfect interpolation can drastically worsen adversarial robustness, and even mild adversarial threats destroy the standard minimax rate if enforced (Peng et al., 22 Jan 2026).

4. Algorithms for Adversarially Robust Regression

Min–Max and Regularized Estimators

Adversarial min–max training:

$\min_\beta \frac{1}{n} \sum_{i=1}^n \max_{\|\delta_i\|_p \leq \varepsilon} (x_i+\delta_i)^T\beta - y_i)^2$

admits a dual convex form linking adversarial robustness to robust regression and parameter shrinkage (Ribeiro et al., 2022, Xie et al., 2024).

Group adversarial training: convex objectives with group penalties enable adaptive control over both sparsity and group structure (Xie et al., 2024).
Two-stage robustification: (1) estimate mean and covariance (by OLS, LASSO, thresholded estimators), (2) apply a risk-minimizing shrinkage determined by adversarial radius, yielding consistency and sharp minimax-optimal rates in both dense and sparse regimes (Xing et al., 2020).

Early Stopping and GD+ Approaches

Gradient descent with early stopping can be near-minimax optimal for $\ell_2$ attacks but can be arbitrarily suboptimal for general Mahalanobis attacks. Feature-rescaling (GD+) schemes, using preconditioning by the square root of the adversary’s norm, restore close-to-optimality (Scetbon et al., 2023).
Robust convex estimators: A two-stage approach based on the surrogate

$\big(\|w-\widehat{w}_0\|_{\Sigma} + r\|w\|_*\big)^2$

achieves within a factor 1.11 of minimax risk for all norms, requiring only standard convex optimization tools.

Robust Streaming and Online Algorithms

Online importance sampling (leveraged row sampling) supports adversarially robust streaming regression: at each arrival, the algorithm tosses fresh random bits to select rows via leverage scores; this approach is empirically and theoretically robust to adaptive, streaming adversaries, unlike fixed-projection sketching (Braverman et al., 2021).
Spectral reweighting (SCRAM) and strong convex SDP relaxations provide distribution-free, online, and high-dimensional robust regression under Huber contamination, achieving the optimal dependence on the contamination fraction $\sqrt{\varepsilon}$ (Chen et al., 2020).

Fairness-aware and Causally Robust Methods

Adversarially robust fair regression: Minimax frameworks incorporating mean-squared and group fairness under point or rank-one poisonings admit explicit solution strategies, maintaining both low accuracy loss and fairness under attack (Jin et al., 2022).
Causal feature estimation via adversarial instrumental variable regression separates causal from non-causal (spurious) directions, and causal inoculation (CAFE) regularizes deep regression defenses by aligning features with their causal components (Kim et al., 2023).

Certified and PAC learning Perspectives

PAC-Bayesian robust Bayesian regression: Generalized robust posteriors admit closed-form adversarial risk and nonvacuous generalization bounds (Sabanayagam et al., 20 Feb 2025).
Certified randomized smoothing: Regression analogs of randomized smoothing provide explicit probabilistic certificates for bounded perturbations, based on concentration inequalities for Gaussian-perturbed average predictors (Rekavandi et al., 2024).
Robust PAC learnability: Classes of bounded real-valued predictors have adversarially robust PAC learners if and only if the fat-shattering dimension is finite (Attias et al., 2022).

5. Empirical Phenomena and Phase Transitions

Phase-Transition Analysis

The accuracy–robustness trade-off often exhibits a phase-transition:

For sufficiently small $\epsilon$ , robustification may be achieved at no cost in standard risk (“free-lunch” regime) (Dohmatob et al., 2023, Bahmani, 2024).
Beyond critical thresholds, standard risk must deteriorate for additional gains in adversarial risk.

Overparametrized models (deep nets, minimum-norm solutions) are particularly fragile: strong interpolation may preserve generalization performance under classical risk but can destroy robustness to even minor adversarial input shifting. As the attack radius or the “influence” of interpolated points increases, adversarial risk can diverge logarithmically with sample size (Peng et al., 22 Jan 2026).

Practical Behavior

Key empirical takeaways from the literature:

Min-max trained and group-penalized estimators empirically maintain minimax-optimal rates under attack, whereas classical estimators fail (Xie et al., 2024).
Online leverage-based coreset methods preserve streaming regression accuracy under adversarial data adaptation, outperforming sketch-based methods, which can exhibit catastrophic failures once the adversary exploits the fixed sketch structure (Braverman et al., 2021).
In deep networks, adversarial training and causal-feature regularization (CAFE) can provide gains in robustness with moderate cost to standard accuracy (Kim et al., 2023).

Algorithmic approach	Certified against adaptive attacks?	Minimax-rate optimality
Importance-sampling streaming	Yes	Yes
Sketch-based streaming	No	No
Early stopping (GD)	Only for $\ell_2$	Yes ( $\ell_2$ only)
Two-stage convex minimization	Yes (all norms)	$\sim$ Yes (all norms)

6. Extensions: Structured, Fair, and Causal Robustness

Adversarial robustness increasingly incorporates additional desiderata beyond minimax risk:

Fairness constraints under adversarial poisonings can be enforced via explicit minimax formulations combining prediction and group error gaps, admitting tractable global or saddle-point solutions (Jin et al., 2022).
In multilearner or federated settings (multiple learners share data or are simultaneously attacked), the game-theoretic approach yields equilibrium models with higher robustness than standard regularization (Tong et al., 2018).
For models exploiting latent group- or causal-structure in features, customized penalties or IV-type regularizations can improve worst-case error bounds (Xie et al., 2024, Kim et al., 2023).

7. Open Problems and Future Directions

Despite significant advances, several challenges remain:

Precise characterization of the accuracy–robustness boundary for classes of nonlinear, nonconvex, or interpolating estimators and deep neural networks (Peng et al., 22 Jan 2026, Bahmani, 2024).
Universal, data-agnostic algorithms reaching minimax adversarial risk without incurring additional sample complexity.
Robust regression under arbitrary (possibly unbounded or structured) perturbation sets, e.g., certified randomized smoothing for general norms, or distributionally-robust objectives for OWL or group-structured regularizers.
Integrating fairness, interpretability, and robust statistical estimation seamlessly into adversarially robust regression frameworks.

These directions represent a convergence of robust statistics, optimization, learning theory, and adversarial ML, as adversarial robustness in regression continues to be a vibrant and rapidly advancing field.