Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimum Absolute Distance Estimator

Updated 7 January 2026
  • The minimum absolute distance estimator is a robust L¹-based method that estimates parameters by minimizing the absolute difference between empirical and model distributions.
  • It is applied in diverse contexts including parametric CDF models, mixture recovery, and multivariate robust location using numerical techniques such as LP/MILP and gradient-based optimizers.
  • Its statistical properties feature strong consistency, asymptotic normality, a high breakdown point up to 50%, and minimax optimality in challenging estimation scenarios.

The minimum absolute distance estimator (also known as minimum L¹-distance estimator or MAD) refers, in its most general form, to the estimation of unknown model parameters—or probability measures—by minimizing the L¹ absolute distance between empirical and model distributions or functionals. This criterion is widely used for robust parameter estimation in parametric models, for robust location in multivariate statistics, for mixing measure recovery in mixture models, and for functional estimation of distributional divergence. Despite its conceptual simplicity, the minimum absolute distance approach encompasses a diverse range of estimation methodologies, each with distinct theoretical properties and computational challenges.

1. Fundamental Definitions and General Framework

The minimum absolute distance estimator is defined by selecting the parameter value (or measure) that minimizes the L¹ (absolute) distance between the empirical distribution (or statistic) and the model-implied counterpart.

  • General parametric CDF model: Given data X1,,XnX_1, \ldots, X_n and parametric CDF F(x;θ)F(x;\theta),

θ^MAD=argminθΘFn(x)F(x;θ)dx,\hat \theta_{\mathrm{MAD}} = \arg\min_{\theta \in \Theta} \int_{-\infty}^\infty |F_n(x) - F(x;\theta)| \, dx,

where FnF_n is the empirical CDF (Nombebe et al., 2022).

  • Non-normalized densities: For families pθ(x)p_\theta(x) (possibly unnormalized), w(t)>0w(t)>0, and characterization function ηn(t,θ)\eta_n(t, \theta),

θ^n,1=argminθΘ0ηn(t,θ)w(t)dt,\widehat \theta_{n,1} = \arg\min_{\theta \in \Theta} \int_0^\infty \big| \eta_n(t, \theta) \big| w(t) \, dt,

where ηn(t,θ)=1nj=1npθ(Xj)pθ(Xj)min{Xj,t}1nj=1n1{Xjt}\eta_n(t, \theta) = -\frac{1}{n} \sum_{j=1}^n \frac{p'_\theta(X_j)}{p_\theta(X_j)} \min\{X_j, t\} - \frac{1}{n} \sum_{j=1}^n \mathbf{1}\{X_j \leq t\} (Betsch et al., 2019).

  • Mixture models and mixing measures: For a class of functions Φ\Phi (e.g., ϕ1\|\phi\|_\infty \leq 1), the Φ\Phi-distance between mixing measures G,GG, G' is

DΦ(G,G)=supϕΦϕdGϕdG.D_\Phi(G, G') = \sup_{\phi \in \Phi} \left| \int \phi \, dG - \int \phi \, dG' \right|.

The minimum absolute (total variation) distance estimator for kk-component mixtures is

G^nargminGMkPnPGTV\hat G_n \in \arg\min_{G \in \mathcal{M}_k} \|P_n - P_G\|_{TV}

(Wei et al., 2023).

  • Multivariate robust location: The minimum (trimmed) absolute distance estimator for location in Rp\mathbb{R}^p is

(θ^,T)=argminθRp,TXn,T=hxiTxiθ1(\hat\theta, T^*) = \arg\min_{\theta \in \mathbb{R}^p,\, T \subseteq X_n,\, |T|=h} \sum_{x_i \in T} \|x_i - \theta\|_1

(Zioutas et al., 2015).

2. Methodological Variants and Computational Strategies

The core MAD criterion admits multiple concrete implementations, depending on statistical context and model structure.

  • Untrimmed case: For univariate location, the solution is the sample median. In higher dimensions, the LTAD approach (least trimmed absolute deviations) generalizes the median by trimming largest residuals and minimizing the L¹-loss (Zioutas et al., 2015).
  • Parametric density models: Objective minimization often requires numerical integration over sample statistics or surrogate loss function evaluation, with optimization performed via L-BFGS-B, SLSQP, or BFGS in scientific computing environments (Nombebe et al., 2022, Betsch et al., 2019).
  • Mixture models: The TV distance minimization forms a nonconvex program often approximated by discretization, global optimization for small kk, or use of reproducing kernel Hilbert space surrogates (e.g., MMD distances) in the multivariate case (Wei et al., 2023).
  • LP/MILP formulations: Robust location estimation via LTAD is rewritten as a mixed integer linear program (MILP) with auxiliary variables, or as a linear program (LP) via relaxation and data centering techniques (Zioutas et al., 2015).
Application Objective Function Numerical Approach
Parametric CDF model Fn(x)F(x;θ)dx\int |F_n(x) - F(x;\theta)| dx Grid or sample-based sum + BFGS or similar optimizer
Multivariate LTAD iTxiθ1\sum_{i\in T} \|x_i - \theta\|_1 MILP, LP, Subgradients
Mixture Models PnPGTV\|P_n - P_G\|_{TV} Grid search, global optimization, kernel methods

3. Statistical Properties

The minimum absolute distance estimator possesses distinct statistical properties governed by both the L¹-norm structure and the trimming (if any).

  • Consistency: Under standard identifiability and regularity conditions, MAD estimators are consistent:
  • Asymptotic Normality: Asymptotic normality holds for MAD in both simple and some robust/truncated settings, with covariance matrices expressed via influence functions or explicit integrals involving the model CDF gradient and sign functions (Nombebe et al., 2022, Zioutas et al., 2015).
  • Breakdown Robustness: The LTAD achieves finite-sample breakdown points up to 50%50\% for maximal trimming, preserving robustness properties akin to the univariate median (Zioutas et al., 2015).
  • Minimax Properties: When used to estimate functional divergences (e.g., L1L_1-distance between discrete distributions), minimax rate-optimality holds up to logarithmic factors, with MAD-type estimators achieving superior performance relative to naïve plug-in approaches in the large-alphabet regime (Jiao et al., 2017).

4. Applications and Empirical Performance

Minimum absolute distance estimators are widely applied:

  • Lomax (Pareto-II) distribution: For two-parameter modeling, MAD outperforms classical MLE and method-of-moments in small samples and remains competitive in moderate samples, with smallest mean squared error (MSE) observed for n<100n < 100 (Nombebe et al., 2022).
  • Short and long-memory time series: In ARFIMA models, bias-corrected minimum distance estimators (BCMDE) employing MAD-type criteria yield reduced bias and smallest MSE, especially when the sample mean is nonconstant and for larger memory parameters (Lana et al., 2018).
  • Mixture estimation: In finite mixtures with compactly supported parameter spaces, minimum TV (L¹) estimators converge at optimal rates in Wasserstein (W1W_1) distance, matching minimax lower bounds and generalizing method of moments and Kolmogorov–Smirnov-type approaches (Wei et al., 2023).
  • High-dimensional robust statistics: LP-based LTAD estimators offer computationally scalable, strongly robust location estimates with virtually zero bias, outperforming classical counterparts under contamination (Zioutas et al., 2015).
Setting Key Property Empirical Outcome
Lomax model, n<100n < 100 MSE, low bias/variance MAD outperforms MLE, MoM
ARFIMA time series Bias under unknown/nontrivial mean BCMDE variants reduce bias/MSE
Finite mixtures Optimal W1W_1-rate, TV convergence MAD-type \to uniform minimax
Multivariate robust location Breakdown, computational efficiency High breakdown, scalable

5. Variants and Extensions

Several extensions and specialization axes exist:

  • Minimum LqL^q-distance estimators: For q=1q=1, the estimator is the minimum absolute distance (MAD), but for q>1q>1 or q=q=\infty, analogous estimators minimizing LqL^q or supremal distances provide trade-offs between sensitivity and robustness; the L2L^2 version is closely related to Cramér–von Mises estimators (Betsch et al., 2019, Nombebe et al., 2022).
  • Mixture estimation via Φ\Phi-distance: MAD arises naturally as the special case where Φ\Phi is the unit ball in LL^\infty, resulting in total-variation loss for mixing measures; kernel-based generalizations (MMD, etc.) further broaden the method’s applicability, especially in high dimensions or non-Euclidean sample spaces (Wei et al., 2023).
  • Trimmed and nontrimmed forms: Trimming parameter hh in LTAD allows interpolation between the full-data MAD (median case) and highly robust estimators for contaminated data (Zioutas et al., 2015).

6. Limitations and Practical Considerations

Although minimum absolute distance estimators enjoy attractive robustness and minimax properties, several practical aspects warrant consideration.

  • Computation: For high-dimensional or large-sample settings, exact minimization may be computationally intensive; the use of subgradient methods, data centering, and surrogates (RKHS/MMD) is essential for tractability (Zioutas et al., 2015, Wei et al., 2023).
  • Bias-variance tradeoff: MAD estimators may display small negative bias in extremely small samples, but compensate by significantly lower variance relative to classical estimators, especially under model misspecification or contamination (Nombebe et al., 2022).
  • Choice of metric and loss: The L1L^1 criterion is less sensitive to outliers but may be less efficient than L2L^2 or KL divergences for well-specified, uncontaminated models in large samples. For optimal performance, hybrid approaches that adapt by sample size are recommended (Nombebe et al., 2022).
  • Model identifiability: Strong identifiability conditions are required for mixture models to guarantee minimax convergence; in settings with low separation or high component overlap, performance may degrade and require regularization or penalized criteria (Wei et al., 2023).

7. Connections to Classical Estimators and Recommendations

The minimum absolute distance estimator is a cornerstone of robust statistics and minimum distance estimation.

  • In the univariate setting, the MAD is the sample median, known for its 50% breakdown point and optimal robustness (Zioutas et al., 2015).
  • In the context of finite mixture estimation and distributional functional estimation, MAD methods achieve optimal nonparametric risk, especially outperforming MLEs when the sample size is comparable to domain size (Jiao et al., 2017, Wei et al., 2023).
  • For practical implementation, MAD (or its close relatives such as minimum squared deviation estimators) are recommended for small to moderate sample sizes or for contaminated data, with bias-adjusted MLEs becoming preferred as sample size increases (Nombebe et al., 2022).

A plausible implication is that the minimum absolute distance estimator provides an essential methodological and theoretical link between robust classical statistics, modern empirical process theory, and computational statistics, with continuing importance in high-dimensional inference, mixture learning, and time series analysis.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimum Absolute Distance Estimator.