Normalized L1 Distance: Scale-Invariant Metric
- Normalized L1 distance is a scale-invariant measure defined as the expected absolute difference normalized by the sum of absolute first moments, ensuring values between 0 and 1.
- It provides closed-form expressions for standard distributions and connects to established indices, such as the Gini index and 1-Wasserstein metric.
- Under specific independence and nonnegativity conditions, it satisfies key metric properties, thereby robustly quantifying statistical discrepancies in diverse applications.
The normalized -distance, denoted %%%%1%%%%, is a probabilistic metric between real-valued integrable random variables and , widely studied for its applications in theoretical and applied fields, such as economics and physics. This distance is defined as the expected absolute difference between and , normalized by the sum of their absolute first moments. Structured to always lie between 0 and 1, it refines the traditional -distance by providing a scale-invariant measure, particularly significant when comparing distributions of differing magnitudes. The normalized -distance encapsulates and unifies several well-established concepts, including the Gini index, the Lukaszyk–Karmovsky metric, and emerges as a special instance within the framework of 1-Wasserstein optimal transport (Rolle, 2021).
1. Formal Definition and Properties
Let be a probability space with , i.e., both are integrable real-valued random variables. The (compound) -distance is
The normalized -distance, defined when , is
and when both expectations vanish. This yields for all such .
Analyzing through the axioms of metric spaces:
- Non-negativity: .
- Symmetry: .
- Reflexivity: .
- Identity of indiscernibles: if and only if almost surely, considering the standard identification of random variables up to almost sure equality.
In general, does not always satisfy the triangle inequality. However, under the condition that , , are mutually independent, integrable, and nonnegative (with at most one of them concentrated at zero), Rolle proves that satisfies the triangle inequality: This is achieved via a specific algebraic inequality involving the individual -distances and first moments, leveraging what is termed a "Canberra-inequality" for all real (Rolle, 2021).
2. Closed-form Expressions for Standard Distributions
Explicit evaluation of is important in statistics and applied modeling. In the case of two independent Gaussians
the expected absolute difference reads
where and denote the cdf and pdf of the standard normal, respectively. The one-marginal expectation is
is then computed by substituting these closed forms.
For independent uniform variables , , the mean absolute difference is determined through an explicit double integration: with polynomials in endpoints providing concrete values in the cases of interval separation, inclusion, or general overlap. For pure separation (), where and are midpoints of the respective intervals. Table summaries of case enumeration and formulas are presented in (Rolle, 2021).
3. Domains of Application and Illustrative Behavior
Normalized -distance is prevalent in fields where scale invariance and robust discrepancy measures are essential. In economics, it appears as the Gini index (see §5). In physics, especially error analysis, is known as the Lukaszyk–Karmovsky metric.
Figures in (Rolle, 2021) exemplify behavior in the bivariate normal setup: as the correlation approaches 1, joint distributions concentrate on the diagonal, and (total dependence implies null normalized distance). In uniform distributions, interpolates from 0 (total overlap) to 1 (one variable identically zero and the other nondegenerate), with critical dependence on support overlap.
4. Connections to Classical Indices and Distances
The normalized -distance not only unifies disparate applications but also recovers several established quantities:
- Gini index: For a distribution , the Gini mean difference is . The Gini index is its normalized analogue:
Thus, is the Gini index viewed as the “autodistance” of a distribution.
- Lukaszyk–Karmovsky metric: , introduced in physics for uncertainty quantification, possesses reflexivity contrary to early misconceptions.
- Optimal transport (1-Wasserstein): If are probability laws, the Monge–Kantorovich problem with cost leads to the 1-Wasserstein distance
where are the cdfs of . For independent , is the cost under the trivial product coupling.
5. Mathematical and Probabilistic Structure
The normalized -distance defines a semimetric on the space of integrable random variables, becoming a full metric when restricted to independent variables, as established through the generalized triangle inequality. The proof involves verifying a nontrivial algebraic condition, ultimately relying on the positivity of the "Canberra-inequality" for all real : This semimetric structure allows for flexible deployment across disparate random variable pairs and distributions, provided integrability conditions are met.
6. Illustrative Regimes and Range
assumes values in , with limiting cases as follows:
- : holds if almost surely or, for instance, in the degenerate case where both random variables vanish.
- : as joint law of is concentrated on the diagonal (e.g., perfect dependence, high correlation).
- : occurs when one variable is almost surely zero while the other is integrable and nondegenerate (Rolle, 2021).
This range captures scenarios of perfect equality, maximal disparity, and interpolation governed by the probabilistic and algebraic relations between the random variables’ distributions.
Normalized -distance thus provides a robust, interpretable, and mathematically grounded similarity measure unifying concepts from diverse fields, with rigorous theoretical guarantees and tractable formulae in common applied cases (Rolle, 2021).