Improved Concentration for Mean Estimators via Shrinkage

Published 14 Dec 2025 in math.ST | (2512.12750v2)

Abstract: We study a class of robust mean estimators $\widehatμ$ obtained by adaptively shrinking the weights of sample points far from a base estimator $\widehatκ$. Given a data-dependent scaling factor $\widehatα$ and a weighting function $w:[0, \infty) \to [0,1]$, we let $\widehatμ = \widehatκ + \frac{1}{n}\sum_{i=1}^n(X_i - \widehatκ)w(\widehatα|X_i-\widehatκ|) $. We prove that, under mild assumptions over $w$, these estimators achieve stronger concentration bounds than the base estimate $\widehatκ$, including sub-Gaussian guarantees. This framework unifies and extends several existing approaches to robust mean estimation in $\mathbb{R}$. Through numerical experiments, we show that our shrinking approach translates to faster concentration, even for small sample sizes.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a unified shrinkage framework that adaptively downweights outlier observations to achieve optimal high-probability concentration bounds.
It generalizes robust estimators like the trimmed mean, Winsorized mean, and median-of-means with provable non-asymptotic guarantees.
Empirical results demonstrate that shrinkage-based estimators enhance finite-sample accuracy while maintaining robustness under heavy-tailed and contaminated models.

Improved Concentration for Mean Estimators via Shrinkage

Introduction and Motivation

This paper rigorously investigates robust mean estimation through an adaptive shrinkage framework, providing a general unification of a wide spectrum of robust mean estimators. The central problem considered is the estimation of the mean $\mu = \mathbb{E} X_1$ from i.i.d. observations $X_{1:n}$ sampled from a possibly heavy-tailed or contaminated distribution $P$ . While the empirical mean is asymptotically efficient and minimax in classical light-tailed regimes, it is suboptimal and highly sensitive to distributional outliers. Consequently, robust alternatives such as the trimmed mean, Winsorized mean, and median-of-means (MoM) have been extensively studied, but each present limitations in either robustness, finite sample properties, or variance control.

The paper's key contribution is a general shrinkage approach which, by adaptively downweighting points far from a central estimate $\widehat{\kappa}$ via a non-increasing weight function $w$ and data-dependent scale $\widehat{\alpha}$ , delivers mean estimators that achieve optimal non-asymptotic high-probability concentration bounds under minimal moment assumptions—even in the presence of arbitrary adversarial contamination.

Shrinkage Framework: Definition and Scope

The proposed shrinkage estimator is defined as

$\widehat{\mu} = \widehat{\kappa} + \frac{1}{n} \sum_{i=1}^n (X_i - \widehat{\kappa}) w(\widehat{\alpha}|X_i - \widehat{\kappa}|),$

where $w: [0, \infty) \to [0, 1]$ is non-increasing and $\widehat{\alpha}$ is chosen so that the total weight mass fulfills $\sum_{i=1}^n w(\widehat{\alpha} |X_i - \widehat{\kappa}|) = n - \eta$ for a user-specified $\eta$ (interpreted as a shrinkage or trimming level). This construction subsumes and generalizes classic robust mean estimators:

Trimmed mean: $w(t) = \mathbf{1}_{t < 1}$
Winsorized mean: $w(t) = 1 \wedge t^{-1}$
Influence function-based estimators [e.g., Lee-Valiant]: $w(t) = (1-t^p)_+$
Median-of-means and related constructions with suitable bucketing.

Remarkably, the framework accommodates arbitrary base estimators $\widehat{\kappa}$ (median, quantile, empirical mean, etc.)—only requiring its contamination behavior is quantifiable—and places no substantial restrictions on $w$ beyond basic regularity (Assumptions 2–3).

Main Theoretical Results

The paper delivers non-asymptotic, high-probability concentration inequalities for the shrinkage estimator under the most adversarial contamination model. The main result states:

Let $X_{1:n}^\varepsilon$ be an $\varepsilon$ -contaminated sample and $\widehat{\kappa}$ a base estimator. Under minimal regularity, with probability at least $1-4\delta$ ,

$|\widehat{\mu}^\varepsilon - \mu| \leq C \left[ \inf_{1<p\leq 2} \nu_p \left( \frac{1}{n} \ln \frac{1}{\delta} \right)^{1 - 1/p} + \inf_{p>1} \nu_p \varepsilon^{1 - 1/p} \right]$

where $\nu_p$ denotes the $p$ th centered moment of $P$ , and $C$ depends on $w$ , $\widehat{\kappa}$ , and $\varepsilon$ . This rate matches known minimax lower bounds and achieves sub-Gaussian deviation in regimes where $\nu_2 < \infty$ [devroye2016sub].

Importantly, even when the base estimator $\widehat{\kappa}$ is poor (e.g., much less robust or imprecise), shrinkage can yield substantial improvement—established via explicit bounds quantifying the dependence on the concentration quality of $\widehat{\kappa}$ (see Theorem 2 and related corollaries). In the limit, if $\widehat{\kappa}$ is chosen as a robust quantile (median), optimal breakdown point and robustness properties are inherited.

Practical Implications and Empirical Analysis

Practical use of the shrinkage estimator is demonstrated through extensive simulations. Empirically, the shrinkage versions of median, MoM, trimmed mean, and Winsorized mean retain their robustness and often outperform their traditional counterparts, in both symmetric and asymmetric heavy-tailed or contaminated distributions. Notable findings include:

Shrinkage improves the finite-sample error of base estimators (empirical mean, median, MoM) in high-confidence intervals, even for small $n$ .
Shrinkage versions based on the median achieve uniformly competitive performance across all scenarios.
The effect of sample splitting (required for theoretical independence of $\widehat{\kappa}$ and the main sample) is minor; marginal performance gains are possible with very small splits to the base, supporting resource-efficient implementations.
Figure 1: Empirical $1-\delta$ confidence interval errors for various mean estimators and distributions. Shrinkage with the sample median as base achieves high accuracy uniformly, outperforming MoM and closely tracking trimmed mean performance.

Furthermore, the authors verify that robust choices of $w$ (e.g., rapid decay for large $t$ ) are critical for resistance to contamination. Shrinkage estimators with slower-decaying weights (e.g., Winsorized mean) are more susceptible to outliers, supporting the theory's emphasis on $w$ 's decay and threshold properties.

Theoretical Properties: Equivariance, Asymptotic Efficiency, Breakdown

The shrinkage class preserves or improves major properties desirable in robust statistical estimation:

Affine Equivariance: If $\widehat{\kappa}$ is affine equivariant, so is $\widehat{\mu}$ .
Asymptotic Normality and Efficiency: The shrinkage estimator converges in distribution to the normal law at the same rate as the empirical mean under minimal moment assumptions.
Breakdown Point: For robust choices (e.g., median as base and suitable $w$ ), the breakdown point is maximized (up to $1/2$), matching the strongest possible in classical robust statistics.
Computational Tractability: Given the simplicity of computing $w$ and $\widehat{\alpha}$ adaptively, the framework remains suited to practical moderate-/large-scale applications.

Broader Implications and Future Directions

The theoretical guarantees within this shrinkage framework strengthen the understanding and practical adoption of robust mean estimation in both classical and modern large-scale regimes. Potential future directions include:

Extension to multivariate mean estimation, possibly leveraging notions of depth and equivariant trimming.
Data-driven adaptive selection of $w$ and $\eta$ for optimal trade-offs between robustness and efficiency.
Application and analysis in stochastic optimization, bandits, and machine learning settings involving heavy-tailed data.
Investigation into computational enhancements and tuning methods, especially for high-dimensional and low-resource settings.

Conclusion

The paper establishes a unified, adaptive approach to robust mean estimation via shrinkage, achieving optimal, high-probability concentration rates under minimal conditions and encompassing many popular estimators as special cases. The theoretical and empirical analysis confirms that adaptive shrinkage fundamentally enhances robustness and statistical efficiency, provides optimal control of adversarial contamination, and admits efficiency and breakdown properties crucial for practical applications.

For robust statistics, stochastic optimization, high-dimensional inference, and adversarial machine learning, the shrinkage estimator's framework and analysis presented is a versatile and foundational advancement.

Reference:

"Improved Concentration for Mean Estimators via Shrinkage" (2512.12750)

Markdown Report Issue