Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimax Lower Bounds: Theory & Applications

Updated 30 January 2026
  • Minimax lower bounds are rigorous benchmarks that quantify the worst-case risk in estimation problems using tools like Fano's and Le Cam's inequalities.
  • They are applied to assess optimality in various settings including parametric, nonparametric, high-dimensional, and distributed estimation frameworks.
  • Modern methodologies extend these bounds to irregular and non-smooth models by incorporating divergence measures and generalized loss functions.

A minimax lower bound characterizes the fundamental difficulty of a statistical decision or estimation problem by quantifying the smallest possible worst-case risk over a class of procedures and underlying models. Rigorous minimax lower bounds establish the best achievable rate (and sometimes the sharp constant) for any estimator or algorithm, over prescribed parameter spaces and loss functions, and are a cornerstone of information-theoretic and statistical complexity results. They serve as benchmarks for assessing and designing optimal methods in parametric, nonparametric, high-dimensional, and interactive settings, under a wide range of constraints and loss regimes.

1. General Framework and Foundational Results

The minimax risk for a statistical problem with parameter space Θ\Theta, data-generating measures {Pθ:θΘ}\{P_\theta: \theta \in \Theta\}, estimator θ^\widehat\theta, and loss L(θ,θ^)L(\theta, \widehat\theta) is

R=infθ^supθΘEθ[L(θ,θ^)].R^* = \inf_{\widehat\theta} \sup_{\theta \in \Theta} \mathbb{E}_{\theta}\left[ L(\theta, \widehat\theta) \right].

Classical lower bounds—such as Fano's, Le Cam's, and Assouad's—reduce minimax estimation to hypothesis testing or packing arguments, using tools including ff-divergences (KL, χ2\chi^2, TV), metric entropy, and information-theoretic covering or packing numbers (Guntuboyina, 2010, Chen et al., 2014). For instance, Guntuboyina's ff-divergence framework extends and unifies many well-known results:

  • Fano’s inequality provides a lower bound via mutual information (KL-divergence) and entropy covering.
  • Le Cam two-point or multiple-point methods relate estimation risk to testing hard pairs or packs of alternatives.
  • Pinsker's and Assouad's lemmas invoke total variation or Hamming-based constructions for functional or combinatorial parameter spaces.

A canonical recipe involves constructing a finite parameter subset with minimal separation η\eta under the loss, bounding the Bayes risk using covering numbers and informativeness (e.g., χ2\chi^2 or KL), and translating this into a minimax lower bound proportional to (η/2)\ell(\eta/2), where \ell is the marginal loss (Guntuboyina, 2010).

These approaches apply broadly to parametric, nonparametric, and high-dimensional models, allowing tight minimax lower bounds even under nonstandard loss or structural regimes (Chen et al., 2014, Ramdas et al., 2016).

2. Local, Asymptotic, and Non-asymptotic Minimax Theory

The sharp local asymptotic minimax (LAM) theorem of Hájek and Le Cam provides the exact asymptotic lower bound for smooth (differentiable) functionals under regular experiments: lim infninfψ^nsupθθ0cn1/2nEθ[(ψ^nψ(θ))2]ψ(θ0)TI(θ0)1ψ(θ0),\liminf_{n \rightarrow \infty} \inf_{\widehat\psi_n} \sup_{\|\theta - \theta_0\| \le c n^{-1/2}} n\, \mathbb{E}_{\theta}[ (\widehat\psi_n - \psi(\theta))^2 ] \geq \nabla\psi(\theta_0)^T \mathcal{I}(\theta_0)^{-1} \nabla\psi(\theta_0), where I(θ)\mathcal{I}(\theta) is the Fisher information (Takatsu et al., 2024). However, this framework is limited to differentiable functionals and locally regular models.

Recent advances have broadened minimax lower bounds to non-differentiable functionals, irregular models, or nonsmooth loss by relying on generalized mixture inequalities utilizing Hellinger or χ2\chi^2 divergences. These bounds eschew explicit differentiability, remaining valid and often sharp in highly nonregular or boundary-influenced regimes (Takatsu et al., 2024).

Explicitly, the Hellinger-mixture lower bound for any prior QQ and measurable estimator TT

infTsupθΘEPθT(X)ψ(θ)2suphRd[AQ,ψ(h)BQ,ψ(h)]+2\inf_T \sup_{\theta \in \Theta} \mathbb{E}_{P_\theta} \|T(X) - \psi(\theta)\|^2 \geq \sup_{h \in \mathbb{R}^d} \left[ \sqrt{A_{Q,\psi}(h)} - \sqrt{B_{Q,\psi}(h)} \right]_+^2

with AQ,ψ(h)A_{Q,\psi}(h) and BQ,ψ(h)B_{Q,\psi}(h) as prior-difference/denominator and prior-average/numerator terms relative to the Hellinger divergence, fully recovers the sharp asymptotic constants in the regular case, while remaining valid for irregular, nonparametric, or directionally differentiable settings.

This generalized approach recovers the classical LAM bound, van Trees (Bayesian Cramér–Rao), Chapman–Robbins, and Hammersley–Chapman–Robbins bounds as special cases by appropriate choices of prior and divergence (Takatsu et al., 2024).

3. Extensions: Loss Functions, Structural Constraints, Communication, and Privacy

Minimax lower bounds have been sharply extended to general loss functions beyond L2L_2, models under sparsity or low-rank constraints, structured matrix/tensor factorization, and settings with distributed data, privacy, or communication bottlenecks.

Non-quadratic Loss and Functionals: Via Efroimovich's entropy-based inequalities, the van Trees inequality is generalized to LqL_q losses for general q1q \ge 1 (Chen et al., 2024). Under regularity, for any estimator θ^\widehat\theta,

Eπθ^θqq2πeCME(q)q(IX(θ)1/d+J(π)1/d)q/2,\mathbb{E}_\pi \|\widehat\theta - \theta\|_q^q \geq \frac{\sqrt{2 \pi e}}{C_{ME}(q)^q} \Big( |I_X(\theta)|^{1/d} + J(\pi)^{1/d} \Big)^{-q/2},

where J(π)J(\pi) is the Fisher information of the prior and CME(q)C_{ME}(q) is the constants from the maximum-entropy distribution under the qq-th moment constraint.

High-dimensional Constraints: Modern lower bound techniques capture the phase transitions under sparsity (0\ell_0), low-rank constraints, or Kronecker/tensor structure. For example, the minimax risk for 0\ell_0-sparse linear regression is

Ω(σ2klog(d/k)n),\Omega\left( \frac{\sigma^2\, k\,\log(d/k)}{n} \right),

and for learning a Kronecker-structured dictionary,

R(N)min{p,r2K,1NKSNRkmkpk}R^*(N) \gtrsim \min\left\{ p, \frac{r^2}{K}, \frac{1}{NK\,\mathrm{SNR}\, \sum_k m_k p_k} \right\}

where the dimension-sum kmkpk\sum_k m_k p_k replaces the full parameter count mpmp, yielding potentially exponential savings (Shakeri et al., 2016, Shakeri et al., 2016).

Distributed and Private Estimation: Under locally differentially private (DP) protocols or finite communication, minimax lower bounds can be expressed in terms of constrained Fisher information. For a mean estimation task,

RLq(Θ)dκ(q)max{(dnmin{ε,ε2})q/2,(1nmin{eε,(eε1)2})q/2}R_{L_q}(\Theta) \gtrsim d\, \kappa(q) \max \left\{ \left( \frac{d}{n\, \min\{\varepsilon, \varepsilon^2\}}\right)^{q/2}, \left( \frac{1}{n\, \min\{e^\varepsilon, (e^\varepsilon-1)^2\}} \right)^{q/2} \right\}

with matching (up to constants) rates holding for blackboard, sequential, and even non-interactive protocols (Chen et al., 2024).

4. Minimax Lower Bounds in Specific Statistical Problems

Rigorous minimax lower bounds have been developed and established as tight in diverse problem classes:

  • Function Estimation on Graphs: For regression or classification on a graph GnG_n with Laplacian LL, if ff or ρ\rho is β\beta-smooth in Laplacian sense, the minimax rate is (Kirichenko et al., 2017)

n2β/(2β+r)n^{-2\beta/(2\beta + r)}

where rr is the “dimension” parameter from spectral geometry.

Ω(H3SAϵ2log(1/δ)),Ω(H3SAT)\Omega\left( \frac{H^3 S A}{\epsilon^2} \log(1/\delta) \right),\qquad \Omega\left( \sqrt{H^3 S A T} \right)

for best policy identification and cumulative regret, respectively.

  • Testing and Independence: For high-dimensional independence testing, any procedure with nontrivial power requires (Ramdas et al., 2016)

npqΣXYF2n \gtrsim \frac{\sqrt{pq}}{\|\Sigma_{XY}\|_F^2}

where ΣXY\Sigma_{XY} is the cross-covariance.

  • Matrix and Tensor Completion: Noisy matrix completion under sparse factor models (per-element MSE) is bounded by (Sambasivan et al., 2015)

RCmin{sAmax2,σ2mr+snN}R^* \geq C\, \min\left\{ s\,A_{\max}^2, \sigma^2 \frac{mr + sn}{N} \right\}

where m,nm, n are matrix dimensions, rr the rank, ss the sparsity, and NN the number of samples.

  • Density Functionals and LpL_p-norms: For estimation of fp\|f\|_p, the rates split by whether pp is integer (parametric thresholds) or not, with regimes detailed by the Nikolskii smoothness; for non-integer pp, an extra logarithmic penalty appears (Goldenshluger et al., 2020).

5. Minimax Quantiles and High-Probability Lower Bounds

Expectation-based minimax risk fails to capture tail risk or guarantee high-confidence performance. Recent developments introduce minimax quantiles: M(δ)=infθ^supθΘQ1δ(θ^,θ),M(\delta) = \inf_{\widehat\theta} \sup_{\theta \in \Theta} Q_{1-\delta}(\widehat\theta, \theta), where Q1δQ_{1-\delta} is the (1δ)(1-\delta)-quantile of the loss (Ma et al., 2024, Bongole et al., 7 Oct 2025). High-probability versions of Le Cam and Fano’s lemmas relate testing complexity to quantile risk, establishing lower bounds for quantiles at all confidence levels. Quantile-to-expectation conversions guarantee

RδM(δ)R^* \geq \delta\, M(\delta)

for any δ(0,1]\delta\in(0,1], so quantile lower bounds immediately imply expectation-level ones, but refine the understanding of rare-event or worst-case instance performance.

6. Modern Developments: Irregular Models, Nonparametric Functionals, and Tightness

Recent work has unified the Fano, Assouad, Le Cam, and van Trees lower bounds through a mixture-based approach, which, by a careful selection of divergence (Hellinger, χ2\chi^2) and perturbation families, yields explicit non-asymptotic or local-asymptotic constants valid for irregular or directionally differentiable functionals (Takatsu et al., 2024, Merhav, 2024). For estimation of the density at a point or non-smooth functionals (e.g., ψ(θ)=max(0,θ)\psi(\theta)=\max(0,\theta)), these generalized mixture bounds yield rates and constants that match those achievable by adaptive estimators, even when classical regularity fails.

Notable advantages of modern minimax lower bound techniques include:

  • No requirement for differentiability of the functional or regularity of the underlying statistical experiment.
  • Uniform applicability to vector or scalar parameters, convex/symmetric loss, and arbitrary moment-type losses.
  • Reproducibility of all classical rate boundaries (parametric, nonparametric, semiparametric), as well as new logarithmic or extra factor corrections induced by problem structure.

7. Practical Implications and Outlook

Minimax lower bounds serve as essential benchmarks for algorithmic and statistical optimality in high-dimensional inference, modern nonparametric estimation, distributed and private learning, reinforcement learning, and structured prediction. Their general methodologies and variants—spanning information-theoretic, entropy, and divergence-based formulations—offer unified pathways both to fundamental impossibility results and to sharp guidance for the design of statistically efficient and robust procedures across a spectrum of classical and emerging problems.

References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimax Lower Bounds.