Papers
Topics
Authors
Recent
Search
2000 character limit reached

Normalized L1 Distance: Scale-Invariant Metric

Updated 18 November 2025
  • Normalized L1 distance is a scale-invariant measure defined as the expected absolute difference normalized by the sum of absolute first moments, ensuring values between 0 and 1.
  • It provides closed-form expressions for standard distributions and connects to established indices, such as the Gini index and 1-Wasserstein metric.
  • Under specific independence and nonnegativity conditions, it satisfies key metric properties, thereby robustly quantifying statistical discrepancies in diverse applications.

The normalized L1L_1-distance, denoted %%%%1%%%%, is a probabilistic metric between real-valued integrable random variables XX and YY, widely studied for its applications in theoretical and applied fields, such as economics and physics. This distance is defined as the expected absolute difference between XX and YY, normalized by the sum of their absolute first moments. Structured to always lie between 0 and 1, it refines the traditional L1L_1-distance by providing a scale-invariant measure, particularly significant when comparing distributions of differing magnitudes. The normalized L1L_1-distance encapsulates and unifies several well-established concepts, including the Gini index, the Lukaszyk–Karmovsky metric, and emerges as a special instance within the framework of 1-Wasserstein optimal transport (Rolle, 2021).

1. Formal Definition and Properties

Let (Ω,A,P)(\Omega, \mathcal A, P) be a probability space with X,YL1(Ω)X, Y \in \mathcal L_1(\Omega), i.e., both are integrable real-valued random variables. The (compound) L1L_1-distance is

D(X,Y)=E[XY]=ΩX(ω)Y(ω)dP(ω).D(X, Y) = E[|X - Y|] = \int_\Omega |X(\omega) - Y(\omega)|\, dP(\omega).

The normalized L1L_1-distance, defined when EX+EY>0E|X| + E|Y| > 0, is

Dnorm(X,Y)=E[XY]E[X]+E[Y],D_{\rm norm}(X, Y) = \frac{E[|X - Y|]}{E[|X|] + E[|Y|]},

and Dnorm(X,Y)=0D_{\rm norm}(X, Y) = 0 when both expectations vanish. This yields 0Dnorm(X,Y)10 \leq D_{\rm norm}(X,Y) \leq 1 for all such X,YX, Y.

Analyzing DnormD_{\rm norm} through the axioms of metric spaces:

  • Non-negativity: Dnorm(X,Y)0D_{\rm norm}(X, Y) \geq 0.
  • Symmetry: Dnorm(X,Y)=Dnorm(Y,X)D_{\rm norm}(X, Y) = D_{\rm norm}(Y, X).
  • Reflexivity: Dnorm(X,X)=0D_{\rm norm}(X, X) = 0.
  • Identity of indiscernibles: Dnorm(X,Y)=0D_{\rm norm}(X, Y) = 0 if and only if X=YX = Y almost surely, considering the standard identification of random variables up to almost sure equality.

In general, DnormD_{\rm norm} does not always satisfy the triangle inequality. However, under the condition that XX, YY, ZZ are mutually independent, integrable, and nonnegative (with at most one of them concentrated at zero), Rolle proves that DnormD_{\rm norm} satisfies the triangle inequality: Dnorm(X,Z)Dnorm(X,Y)+Dnorm(Y,Z).D_{\rm norm}(X, Z) \leq D_{\rm norm}(X, Y) + D_{\rm norm}(Y, Z). This is achieved via a specific algebraic inequality involving the individual L1L_1-distances and first moments, leveraging what is termed a "Canberra-inequality" for all real x,y,zx, y, z (Rolle, 2021).

2. Closed-form Expressions for Standard Distributions

Explicit evaluation of DnormD_{\rm norm} is important in statistics and applied modeling. In the case of two independent Gaussians

XN(μ1,σ12),YN(μ2,σ22),X \sim N(\mu_1, \sigma_1^2), \quad Y \sim N(\mu_2, \sigma_2^2),

the expected absolute difference reads

EXY=μ1μ2[2Φ ⁣(μ1μ2σ12+σ22)1]+2σ12+σ22ϕ ⁣(μ1μ2σ12+σ22),E|X-Y| = |\mu_1 - \mu_2|\, \left[ 2 \Phi\!\left(\frac{|\mu_1-\mu_2|}{\sqrt{\sigma_1^2 + \sigma_2^2}}\right) - 1 \right] + 2\sqrt{\sigma_1^2 + \sigma_2^2}\, \phi\!\left(\frac{|\mu_1-\mu_2|}{\sqrt{\sigma_1^2 + \sigma_2^2}}\right),

where Φ\Phi and ϕ\phi denote the cdf and pdf of the standard normal, respectively. The one-marginal expectation is

EX=μ[2Φ(μσ)1]+2σϕ(μσ).E|X| = |\mu| \left[2 \Phi \left(\frac{|\mu|}{\sigma}\right) - 1 \right] + 2\sigma \phi\left(\frac{|\mu|}{\sigma}\right).

Dnorm(X,Y)D_{\rm norm}(X, Y) is then computed by substituting these closed forms.

For independent uniform variables XU([a,b])X \sim U([a, b]), YU([c,d])Y \sim U([c, d]), the mean absolute difference is determined through an explicit double integration: EXY=1(ba)(dc)abcdxydydx,E|X-Y| = \frac{1}{(b-a)(d-c)} \int_{a}^{b} \int_{c}^{d} |x-y|\, dy\, dx, with polynomials in endpoints providing concrete values in the cases of interval separation, inclusion, or general overlap. For pure separation (b<cb < c), EXY=mXmYE|X-Y| = |m_X - m_Y| where mXm_X and mYm_Y are midpoints of the respective intervals. Table summaries of case enumeration and formulas are presented in (Rolle, 2021).

3. Domains of Application and Illustrative Behavior

Normalized L1L_1-distance is prevalent in fields where scale invariance and robust discrepancy measures are essential. In economics, it appears as the Gini index (see §5). In physics, especially error analysis, D(X,Y)D(X, Y) is known as the Lukaszyk–Karmovsky metric.

Figures in (Rolle, 2021) exemplify behavior in the bivariate normal setup: as the correlation ρ\rho approaches 1, joint distributions concentrate on the diagonal, and Dnorm(X,Y)0D_{\rm norm}(X, Y) \to 0 (total dependence implies null normalized distance). In uniform distributions, DnormD_{\rm norm} interpolates from 0 (total overlap) to 1 (one variable identically zero and the other nondegenerate), with critical dependence on support overlap.

4. Connections to Classical Indices and Distances

The normalized L1L_1-distance not only unifies disparate applications but also recovers several established quantities:

  • Gini index: For a distribution μ\mu, the Gini mean difference is EX,YμXYE_{X, Y \sim \mu} |X-Y|. The Gini index is its normalized analogue:

G(μ)=GMD(μ)2E(X)=Dnorm(X,Y) for X,Y i.i.d. μ.G(\mu) = \frac{ \mathrm{GMD}(\mu) }{ 2E(X) } = D_{\rm norm}(X, Y ) \text{ for } X, Y \text{ i.i.d.\ } \mu.

Thus, DnormD_{\rm norm} is the Gini index viewed as the “autodistance” of a distribution.

  • Lukaszyk–Karmovsky metric: D(X,Y)=EXYD(X, Y)=E|X-Y|, introduced in physics for uncertainty quantification, possesses reflexivity contrary to early misconceptions.
  • Optimal transport (1-Wasserstein): If μ,ν\mu, \nu are probability laws, the Monge–Kantorovich problem with xy|x-y| cost leads to the 1-Wasserstein distance

W1(μ,ν)=infπΠ(μ,ν)Eπ[XY]=01F1(t)G1(t)dt,W_1(\mu, \nu) = \inf_{\pi \in \Pi(\mu, \nu)} E_\pi[|X-Y|] = \int_0^1 |F^{-1}(t) - G^{-1}(t)| \, dt,

where F,GF, G are the cdfs of μ,ν\mu, \nu. For independent Xμ,YνX \sim \mu, Y \sim \nu, D(X,Y)D(X, Y) is the cost under the trivial product coupling.

5. Mathematical and Probabilistic Structure

The normalized L1L_1-distance defines a semimetric on the space of integrable random variables, becoming a full metric when restricted to independent variables, as established through the generalized triangle inequality. The proof involves verifying a nontrivial algebraic condition, ultimately relying on the positivity of the "Canberra-inequality" for all real x,y,zx, y, z: yzxxzy+xyz0.|y-z||x| - |x-z||y| + |x-y||z| \geq 0. This semimetric structure allows for flexible deployment across disparate random variable pairs and distributions, provided integrability conditions are met.

6. Illustrative Regimes and Range

Dnorm(X,Y)D_{\rm norm}(X, Y) assumes values in [0,1][0, 1], with limiting cases as follows:

  • Dnorm(X,Y)=0D_{\rm norm}(X, Y) = 0: holds if X=YX = Y almost surely or, for instance, in the degenerate case where both random variables vanish.
  • Dnorm(X,Y)0D_{\rm norm}(X, Y) \to 0: as joint law of (X,Y)(X, Y) is concentrated on the diagonal (e.g., perfect dependence, high correlation).
  • Dnorm(X,Y)=1D_{\rm norm}(X, Y) = 1: occurs when one variable is almost surely zero while the other is integrable and nondegenerate (Rolle, 2021).

This range captures scenarios of perfect equality, maximal disparity, and interpolation governed by the probabilistic and algebraic relations between the random variables’ distributions.


Normalized L1L_1-distance thus provides a robust, interpretable, and mathematically grounded similarity measure unifying concepts from diverse fields, with rigorous theoretical guarantees and tractable formulae in common applied cases (Rolle, 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Normalized L1 Distance.