Quantum Wasserstein Distance in Generative Models

Updated 30 January 2026

Quantum Wasserstein Distance is a metric that generalizes classical optimal transport to measure dissimilarities between quantum probability distributions using density matrices.
It is widely applied in quantum generative models, such as QuDDPM, to robustly train models by minimizing transport cost between noisy and denoised ensembles.
Practical algorithms leveraging linear programming and sliced approximations address computational challenges in high dimensions and improve quantum many-body system evaluations.

Quantum Wasserstein Distance is a statistical metric adapted to measure the distance between probability distributions over quantum state spaces. It generalizes the classical Wasserstein distance—central to optimal transport theory—to quantum probability distributions and has become integral to the development and evaluation of quantum generative models, particularly quantum denoising diffusion probabilistic models (QuDDPM) (Zhang et al., 2023).

1. Mathematical Formulation of Quantum Wasserstein Distance

In quantum generative modeling, quantum states are represented by density matrices $\rho, \sigma$ on a Hilbert space of dimension $d=2^n$ . The quantum Wasserstein- $p$ distance provides a notion of "cost to transport" between two ensembles of such states.

Unlike the classical setting, where probability distributions are scalar functions, quantum states are matrices with constraints: Hermitian, positive semidefinite, and trace-one. To apply optimal transport, distance matrices (cost matrices) are constructed, with entries reflecting physical distances or dissimilarities between quantum states. Two widely utilized forms include:

For pure states: $C_{i,j} = 1 - |\langle\psi_i|\phi_j\rangle|^2$ .
For mixed states: $C_{i,j} = 1 - G(\rho_i, \sigma_j)$ , where $G(\rho, \sigma)$ is the superfidelity $G(\rho, \sigma) = \operatorname{Tr}(\rho\sigma) + \sqrt{\left[1-\operatorname{Tr}(\rho^2)\right]\left[1-\operatorname{Tr}(\sigma^2)\right]}$ (Kwun et al., 2024).

Given quantum ensembles $\{\rho_i\}$ and $\{\sigma_j\}$ , one solves the linear program:

$\min_P \langle P, C \rangle \qquad \text{s.t. } P\mathbf{1}_w = r, \;\; P^\top\mathbf{1}_v = s, \;\; P \geq 0$

where $d=2^n$ 0 is the transport matrix, and $d=2^n$ 1 are uniform probability weights over samples.

2. Role in Quantum Diffusion Models

The quantum Wasserstein distance is explicitly used as a loss functional in multiple quantum generative frameworks for matching the forward-diffused (noisy) ensemble to the learned (denoised) ensemble (Zhang et al., 2023, Zhu et al., 2024, Kwun et al., 2024):

In QuDDPM, training objectives may minimize the $d=2^n$ 2-Wasserstein distance between empirical ensembles at each diffusion step. This enables robust learning even where fidelity-based kernel methods such as MMD (Maximum Mean Discrepancy) become ill-posed or degenerate—e.g., in quantum data with nontrivial topologies (such as states distributed on a Bloch sphere ring) (Zhang et al., 2023).
Structure-preserving diffusion models (SPDM) further elevate Wasserstein distances (including sliced and maximum sliced variants) to quantify the generative quality of quantum mixed-state ensembles and the entanglement structure of their output (Zhu et al., 2024).

These schemes require efficient algorithms for evaluating pairwise quantum cost matrices and for solving the optimal assignment problem in high dimensions.

3. Properties and Computational Aspects

Quantum Wasserstein distances retain key properties from the classical case:

Metric properties: nonnegativity, identity of indiscernibles, and (when suitably defined) the triangle inequality.
Connection with physical observables: For certain choices of the cost function, the Wasserstein distance reflects operationally meaningful measures, e.g., differences in magnetization or entanglement negativity distributions in many-body quantum states (Zhang et al., 2023, Zhu et al., 2024).

Sample complexity for estimating Wasserstein distance scales polynomially in the number of qubits $d=2^n$ 3 when using empirical ensembles, provided each ensemble is of polynomial size (Zhang et al., 2023). In mixed-state models, structure-preserving architectures ensure all sampled states satisfy quantum constraints (Hermiticity, positivity, trace-one), enabling robust cost matrix construction without post hoc correction (Zhu et al., 2024, Kwun et al., 2024).

Efficient quantum Wasserstein computation has been demonstrated via linear programming (for discrete samples) and is tractable for $d=2^n$ 4 when $d=2^n$ 5. For larger systems, sliced variants or kernel approximations reduce complexity (Zhu et al., 2024). Quantum optimal transport algorithms for physical hardware remain a topic of ongoing research.

4. Applications in Quantum Generative Modeling

Quantum Wasserstein distance has been essential in the empirical and theoretical validation of quantum generative models:

Topological quantum data: Wasserstein metrics can distinguish nonlinear structures, such as Bloch-circle ensembles, and avoid degeneracy inherent in MMD kernels on highly symmetric spaces (Zhang et al., 2023).
Mixed-state ensemble synthesis: SPDM uses the Wasserstein distance as its primary generative metric, analyzing recovery of known quantum negativity and eigenvalue distributions in 4-qubit mixed states (Zhu et al., 2024).
Many-body phase reconstruction: Wasserstein distance between true and generated magnetization distributions quantifies physical faithfulness of the generated quantum many-body ground states (Zhang et al., 2023, Kwun et al., 2024).

Empirical values in benchmark studies are typically on the order of $d=2^n$ 6 to $d=2^n$ 7 for high-fidelity models, and increase to $d=2^n$ 8 for random state references or under poorly trained architectures (Zhu et al., 2024).

5. Limitations, Open Questions, and Future Directions

Quantum Wasserstein distances, while versatile, have several limitations arising from the high-dimensional structure and peculiarities of quantum data:

Scalability: Exact computation is restricted to small $d=2^n$ 9 due to the $p$ 0 nature of cost matrices; approximations or sliced methods are necessary for larger quantum systems (Zhu et al., 2024).
Degeneracy: Certain data topologies or noise schedules may cause Wasserstein metrics to become ill-defined or degenerate; selection of appropriate cost functions remains an open challenge (Zhang et al., 2023).
Hardware implementation: Current approaches rely mostly on classical postprocessing of quantum-generated samples. Quantum-native computation of Wasserstein distance on hardware is not yet practical for $p$ 1.

Open directions include the extension of Wasserstein-like metrics to continuous-variable quantum systems, the development of hardware-tailored sliced Wasserstein estimation, and the analytic characterization of how quantum Wasserstein distance reflects underlying physical properties (e.g., entanglement spectrum, phase transitions) in generative quantum models (Zhu et al., 2024, Zhang et al., 2023).

6. Comparative Context Within Quantum Generative Metrics

In quantum generative model training and evaluation, Wasserstein distance is frequently compared with other metrics:

Metric	Fundamental Object	Typical Use Case / Context
Quantum Wasserstein	Cost matrix over state pairs	Topological quantum data, many-body ensemble generation
Quantum MMD (fidelity kernel)	Pairwise overlaps $p$ 2	Rapid estimation for pure states, but susceptible to degeneracy
Superfidelity	$p$ 3 upper bounds Uhlmann fidelity	Mixed-state metric, efficient computation (Kwun et al., 2024)

Wasserstein distance is preferred where the geometry of quantum data is nontrivial, or where kernel-based metrics fail to capture essential generative quality (Zhang et al., 2023, Zhu et al., 2024, Kwun et al., 2024). This suggests that the quantum Wasserstein distance will remain a central tool in the evaluation, optimization, and analysis of quantum generative learning models.