Minimum Absolute Distance Estimator
- The minimum absolute distance estimator is a robust L¹-based method that estimates parameters by minimizing the absolute difference between empirical and model distributions.
- It is applied in diverse contexts including parametric CDF models, mixture recovery, and multivariate robust location using numerical techniques such as LP/MILP and gradient-based optimizers.
- Its statistical properties feature strong consistency, asymptotic normality, a high breakdown point up to 50%, and minimax optimality in challenging estimation scenarios.
The minimum absolute distance estimator (also known as minimum L¹-distance estimator or MAD) refers, in its most general form, to the estimation of unknown model parameters—or probability measures—by minimizing the L¹ absolute distance between empirical and model distributions or functionals. This criterion is widely used for robust parameter estimation in parametric models, for robust location in multivariate statistics, for mixing measure recovery in mixture models, and for functional estimation of distributional divergence. Despite its conceptual simplicity, the minimum absolute distance approach encompasses a diverse range of estimation methodologies, each with distinct theoretical properties and computational challenges.
1. Fundamental Definitions and General Framework
The minimum absolute distance estimator is defined by selecting the parameter value (or measure) that minimizes the L¹ (absolute) distance between the empirical distribution (or statistic) and the model-implied counterpart.
- General parametric CDF model: Given data and parametric CDF ,
where is the empirical CDF (Nombebe et al., 2022).
- Non-normalized densities: For families (possibly unnormalized), , and characterization function ,
where (Betsch et al., 2019).
- Mixture models and mixing measures: For a class of functions (e.g., ), the -distance between mixing measures is
The minimum absolute (total variation) distance estimator for -component mixtures is
- Multivariate robust location: The minimum (trimmed) absolute distance estimator for location in is
2. Methodological Variants and Computational Strategies
The core MAD criterion admits multiple concrete implementations, depending on statistical context and model structure.
- Untrimmed case: For univariate location, the solution is the sample median. In higher dimensions, the LTAD approach (least trimmed absolute deviations) generalizes the median by trimming largest residuals and minimizing the L¹-loss (Zioutas et al., 2015).
- Parametric density models: Objective minimization often requires numerical integration over sample statistics or surrogate loss function evaluation, with optimization performed via L-BFGS-B, SLSQP, or BFGS in scientific computing environments (Nombebe et al., 2022, Betsch et al., 2019).
- Mixture models: The TV distance minimization forms a nonconvex program often approximated by discretization, global optimization for small , or use of reproducing kernel Hilbert space surrogates (e.g., MMD distances) in the multivariate case (Wei et al., 2023).
- LP/MILP formulations: Robust location estimation via LTAD is rewritten as a mixed integer linear program (MILP) with auxiliary variables, or as a linear program (LP) via relaxation and data centering techniques (Zioutas et al., 2015).
| Application | Objective Function | Numerical Approach |
|---|---|---|
| Parametric CDF model | Grid or sample-based sum + BFGS or similar optimizer | |
| Multivariate LTAD | MILP, LP, Subgradients | |
| Mixture Models | Grid search, global optimization, kernel methods |
3. Statistical Properties
The minimum absolute distance estimator possesses distinct statistical properties governed by both the L¹-norm structure and the trimming (if any).
- Consistency: Under standard identifiability and regularity conditions, MAD estimators are consistent:
- For continuous parametric families, strong consistency holds under compactness and continuity of the loss function in the parameter (Betsch et al., 2019, Nombebe et al., 2022).
- For mixture models, minimax rates in Wasserstein metrics are derived, matching lower bounds for both uniform and local settings (Wei et al., 2023).
- Asymptotic Normality: Asymptotic normality holds for MAD in both simple and some robust/truncated settings, with covariance matrices expressed via influence functions or explicit integrals involving the model CDF gradient and sign functions (Nombebe et al., 2022, Zioutas et al., 2015).
- Breakdown Robustness: The LTAD achieves finite-sample breakdown points up to for maximal trimming, preserving robustness properties akin to the univariate median (Zioutas et al., 2015).
- Minimax Properties: When used to estimate functional divergences (e.g., -distance between discrete distributions), minimax rate-optimality holds up to logarithmic factors, with MAD-type estimators achieving superior performance relative to naïve plug-in approaches in the large-alphabet regime (Jiao et al., 2017).
4. Applications and Empirical Performance
Minimum absolute distance estimators are widely applied:
- Lomax (Pareto-II) distribution: For two-parameter modeling, MAD outperforms classical MLE and method-of-moments in small samples and remains competitive in moderate samples, with smallest mean squared error (MSE) observed for (Nombebe et al., 2022).
- Short and long-memory time series: In ARFIMA models, bias-corrected minimum distance estimators (BCMDE) employing MAD-type criteria yield reduced bias and smallest MSE, especially when the sample mean is nonconstant and for larger memory parameters (Lana et al., 2018).
- Mixture estimation: In finite mixtures with compactly supported parameter spaces, minimum TV (L¹) estimators converge at optimal rates in Wasserstein () distance, matching minimax lower bounds and generalizing method of moments and Kolmogorov–Smirnov-type approaches (Wei et al., 2023).
- High-dimensional robust statistics: LP-based LTAD estimators offer computationally scalable, strongly robust location estimates with virtually zero bias, outperforming classical counterparts under contamination (Zioutas et al., 2015).
| Setting | Key Property | Empirical Outcome |
|---|---|---|
| Lomax model, | MSE, low bias/variance | MAD outperforms MLE, MoM |
| ARFIMA time series | Bias under unknown/nontrivial mean | BCMDE variants reduce bias/MSE |
| Finite mixtures | Optimal -rate, TV convergence | MAD-type uniform minimax |
| Multivariate robust location | Breakdown, computational efficiency | High breakdown, scalable |
5. Variants and Extensions
Several extensions and specialization axes exist:
- Minimum -distance estimators: For , the estimator is the minimum absolute distance (MAD), but for or , analogous estimators minimizing or supremal distances provide trade-offs between sensitivity and robustness; the version is closely related to Cramér–von Mises estimators (Betsch et al., 2019, Nombebe et al., 2022).
- Mixture estimation via -distance: MAD arises naturally as the special case where is the unit ball in , resulting in total-variation loss for mixing measures; kernel-based generalizations (MMD, etc.) further broaden the method’s applicability, especially in high dimensions or non-Euclidean sample spaces (Wei et al., 2023).
- Trimmed and nontrimmed forms: Trimming parameter in LTAD allows interpolation between the full-data MAD (median case) and highly robust estimators for contaminated data (Zioutas et al., 2015).
6. Limitations and Practical Considerations
Although minimum absolute distance estimators enjoy attractive robustness and minimax properties, several practical aspects warrant consideration.
- Computation: For high-dimensional or large-sample settings, exact minimization may be computationally intensive; the use of subgradient methods, data centering, and surrogates (RKHS/MMD) is essential for tractability (Zioutas et al., 2015, Wei et al., 2023).
- Bias-variance tradeoff: MAD estimators may display small negative bias in extremely small samples, but compensate by significantly lower variance relative to classical estimators, especially under model misspecification or contamination (Nombebe et al., 2022).
- Choice of metric and loss: The criterion is less sensitive to outliers but may be less efficient than or KL divergences for well-specified, uncontaminated models in large samples. For optimal performance, hybrid approaches that adapt by sample size are recommended (Nombebe et al., 2022).
- Model identifiability: Strong identifiability conditions are required for mixture models to guarantee minimax convergence; in settings with low separation or high component overlap, performance may degrade and require regularization or penalized criteria (Wei et al., 2023).
7. Connections to Classical Estimators and Recommendations
The minimum absolute distance estimator is a cornerstone of robust statistics and minimum distance estimation.
- In the univariate setting, the MAD is the sample median, known for its 50% breakdown point and optimal robustness (Zioutas et al., 2015).
- In the context of finite mixture estimation and distributional functional estimation, MAD methods achieve optimal nonparametric risk, especially outperforming MLEs when the sample size is comparable to domain size (Jiao et al., 2017, Wei et al., 2023).
- For practical implementation, MAD (or its close relatives such as minimum squared deviation estimators) are recommended for small to moderate sample sizes or for contaminated data, with bias-adjusted MLEs becoming preferred as sample size increases (Nombebe et al., 2022).
A plausible implication is that the minimum absolute distance estimator provides an essential methodological and theoretical link between robust classical statistics, modern empirical process theory, and computational statistics, with continuing importance in high-dimensional inference, mixture learning, and time series analysis.