Minimum Absolute Distance Estimator

Updated 7 January 2026

The minimum absolute distance estimator is a robust L¹-based method that estimates parameters by minimizing the absolute difference between empirical and model distributions.
It is applied in diverse contexts including parametric CDF models, mixture recovery, and multivariate robust location using numerical techniques such as LP/MILP and gradient-based optimizers.
Its statistical properties feature strong consistency, asymptotic normality, a high breakdown point up to 50%, and minimax optimality in challenging estimation scenarios.

The minimum absolute distance estimator (also known as minimum L¹-distance estimator or MAD) refers, in its most general form, to the estimation of unknown model parameters—or probability measures—by minimizing the L¹ absolute distance between empirical and model distributions or functionals. This criterion is widely used for robust parameter estimation in parametric models, for robust location in multivariate statistics, for mixing measure recovery in mixture models, and for functional estimation of distributional divergence. Despite its conceptual simplicity, the minimum absolute distance approach encompasses a diverse range of estimation methodologies, each with distinct theoretical properties and computational challenges.

1. Fundamental Definitions and General Framework

The minimum absolute distance estimator is defined by selecting the parameter value (or measure) that minimizes the L¹ (absolute) distance between the empirical distribution (or statistic) and the model-implied counterpart.

General parametric CDF model: Given data $X_1, \ldots, X_n$ and parametric CDF $F(x;\theta)$ ,

$\hat \theta_{\mathrm{MAD}} = \arg\min_{\theta \in \Theta} \int_{-\infty}^\infty |F_n(x) - F(x;\theta)| \, dx,$

where $F_n$ is the empirical CDF (Nombebe et al., 2022).

Non-normalized densities: For families $p_\theta(x)$ (possibly unnormalized), $w(t)>0$ , and characterization function $\eta_n(t, \theta)$ ,

$\widehat \theta_{n,1} = \arg\min_{\theta \in \Theta} \int_0^\infty \big| \eta_n(t, \theta) \big| w(t) \, dt,$

where $\eta_n(t, \theta) = -\frac{1}{n} \sum_{j=1}^n \frac{p'_\theta(X_j)}{p_\theta(X_j)} \min\{X_j, t\} - \frac{1}{n} \sum_{j=1}^n \mathbf{1}\{X_j \leq t\}$ (Betsch et al., 2019).

Mixture models and mixing measures: For a class of functions $\Phi$ (e.g., $\|\phi\|_\infty \leq 1$ ), the $\Phi$ -distance between mixing measures $G, G'$ is

$D_\Phi(G, G') = \sup_{\phi \in \Phi} \left| \int \phi \, dG - \int \phi \, dG' \right|.$

The minimum absolute (total variation) distance estimator for $k$ -component mixtures is

$\hat G_n \in \arg\min_{G \in \mathcal{M}_k} \|P_n - P_G\|_{TV}$

(Wei et al., 2023).

Multivariate robust location: The minimum (trimmed) absolute distance estimator for location in $\mathbb{R}^p$ is

$(\hat\theta, T^*) = \arg\min_{\theta \in \mathbb{R}^p,\, T \subseteq X_n,\, |T|=h} \sum_{x_i \in T} \|x_i - \theta\|_1$

(Zioutas et al., 2015).

2. Methodological Variants and Computational Strategies

The core MAD criterion admits multiple concrete implementations, depending on statistical context and model structure.

Untrimmed case: For univariate location, the solution is the sample median. In higher dimensions, the LTAD approach (least trimmed absolute deviations) generalizes the median by trimming largest residuals and minimizing the L¹-loss (Zioutas et al., 2015).
Parametric density models: Objective minimization often requires numerical integration over sample statistics or surrogate loss function evaluation, with optimization performed via L-BFGS-B, SLSQP, or BFGS in scientific computing environments (Nombebe et al., 2022, Betsch et al., 2019).
Mixture models: The TV distance minimization forms a nonconvex program often approximated by discretization, global optimization for small $k$ , or use of reproducing kernel Hilbert space surrogates (e.g., MMD distances) in the multivariate case (Wei et al., 2023).
LP/MILP formulations: Robust location estimation via LTAD is rewritten as a mixed integer linear program (MILP) with auxiliary variables, or as a linear program (LP) via relaxation and data centering techniques (Zioutas et al., 2015).

Application	Objective Function	Numerical Approach
Parametric CDF model	$\int \|F_n(x) - F(x;\theta)\| dx$	Grid or sample-based sum + BFGS or similar optimizer
Multivariate LTAD	$\sum_{i\in T} \\|x_i - \theta\\|_1$	MILP, LP, Subgradients
Mixture Models	$\\|P_n - P_G\\|_{TV}$	Grid search, global optimization, kernel methods

3. Statistical Properties

The minimum absolute distance estimator possesses distinct statistical properties governed by both the L¹-norm structure and the trimming (if any).

Consistency: Under standard identifiability and regularity conditions, MAD estimators are consistent:
- For continuous parametric families, strong consistency holds under compactness and continuity of the loss function in the parameter (Betsch et al., 2019, Nombebe et al., 2022).
- For mixture models, minimax rates in Wasserstein metrics are derived, matching lower bounds for both uniform and local settings (Wei et al., 2023).
Asymptotic Normality: Asymptotic normality holds for MAD in both simple and some robust/truncated settings, with covariance matrices expressed via influence functions or explicit integrals involving the model CDF gradient and sign functions (Nombebe et al., 2022, Zioutas et al., 2015).
Breakdown Robustness: The LTAD achieves finite-sample breakdown points up to $50\%$ for maximal trimming, preserving robustness properties akin to the univariate median (Zioutas et al., 2015).
Minimax Properties: When used to estimate functional divergences (e.g., $L_1$ -distance between discrete distributions), minimax rate-optimality holds up to logarithmic factors, with MAD-type estimators achieving superior performance relative to naïve plug-in approaches in the large-alphabet regime (Jiao et al., 2017).

4. Applications and Empirical Performance

Minimum absolute distance estimators are widely applied:

Lomax (Pareto-II) distribution: For two-parameter modeling, MAD outperforms classical MLE and method-of-moments in small samples and remains competitive in moderate samples, with smallest mean squared error (MSE) observed for $n < 100$ (Nombebe et al., 2022).
Short and long-memory time series: In ARFIMA models, bias-corrected minimum distance estimators (BCMDE) employing MAD-type criteria yield reduced bias and smallest MSE, especially when the sample mean is nonconstant and for larger memory parameters (Lana et al., 2018).
Mixture estimation: In finite mixtures with compactly supported parameter spaces, minimum TV (L¹) estimators converge at optimal rates in Wasserstein ( $W_1$ ) distance, matching minimax lower bounds and generalizing method of moments and Kolmogorov–Smirnov-type approaches (Wei et al., 2023).
High-dimensional robust statistics: LP-based LTAD estimators offer computationally scalable, strongly robust location estimates with virtually zero bias, outperforming classical counterparts under contamination (Zioutas et al., 2015).

Setting	Key Property	Empirical Outcome
Lomax model, $n < 100$	MSE, low bias/variance	MAD outperforms MLE, MoM
ARFIMA time series	Bias under unknown/nontrivial mean	BCMDE variants reduce bias/MSE
Finite mixtures	Optimal $W_1$ -rate, TV convergence	MAD-type $\to$ uniform minimax
Multivariate robust location	Breakdown, computational efficiency	High breakdown, scalable

5. Variants and Extensions

Several extensions and specialization axes exist:

Minimum $L^q$ -distance estimators: For $q=1$ , the estimator is the minimum absolute distance (MAD), but for $q>1$ or $q=\infty$ , analogous estimators minimizing $L^q$ or supremal distances provide trade-offs between sensitivity and robustness; the $L^2$ version is closely related to Cramér–von Mises estimators (Betsch et al., 2019, Nombebe et al., 2022).
Mixture estimation via $\Phi$ -distance: MAD arises naturally as the special case where $\Phi$ is the unit ball in $L^\infty$ , resulting in total-variation loss for mixing measures; kernel-based generalizations (MMD, etc.) further broaden the method’s applicability, especially in high dimensions or non-Euclidean sample spaces (Wei et al., 2023).
Trimmed and nontrimmed forms: Trimming parameter $h$ in LTAD allows interpolation between the full-data MAD (median case) and highly robust estimators for contaminated data (Zioutas et al., 2015).

6. Limitations and Practical Considerations

Although minimum absolute distance estimators enjoy attractive robustness and minimax properties, several practical aspects warrant consideration.

Computation: For high-dimensional or large-sample settings, exact minimization may be computationally intensive; the use of subgradient methods, data centering, and surrogates (RKHS/MMD) is essential for tractability (Zioutas et al., 2015, Wei et al., 2023).
Bias-variance tradeoff: MAD estimators may display small negative bias in extremely small samples, but compensate by significantly lower variance relative to classical estimators, especially under model misspecification or contamination (Nombebe et al., 2022).
Choice of metric and loss: The $L^1$ criterion is less sensitive to outliers but may be less efficient than $L^2$ or KL divergences for well-specified, uncontaminated models in large samples. For optimal performance, hybrid approaches that adapt by sample size are recommended (Nombebe et al., 2022).
Model identifiability: Strong identifiability conditions are required for mixture models to guarantee minimax convergence; in settings with low separation or high component overlap, performance may degrade and require regularization or penalized criteria (Wei et al., 2023).

7. Connections to Classical Estimators and Recommendations

The minimum absolute distance estimator is a cornerstone of robust statistics and minimum distance estimation.

In the univariate setting, the MAD is the sample median, known for its 50% breakdown point and optimal robustness (Zioutas et al., 2015).
In the context of finite mixture estimation and distributional functional estimation, MAD methods achieve optimal nonparametric risk, especially outperforming MLEs when the sample size is comparable to domain size (Jiao et al., 2017, Wei et al., 2023).
For practical implementation, MAD (or its close relatives such as minimum squared deviation estimators) are recommended for small to moderate sample sizes or for contaminated data, with bias-adjusted MLEs becoming preferred as sample size increases (Nombebe et al., 2022).

A plausible implication is that the minimum absolute distance estimator provides an essential methodological and theoretical link between robust classical statistics, modern empirical process theory, and computational statistics, with continuing importance in high-dimensional inference, mixture learning, and time series analysis.