Recovering Counterfactual Distributions via Wasserstein GANs

Published 24 Jan 2026 in econ.EM | (2601.17296v1)

Abstract: Standard Distributional Synthetic Controls (DSC) estimate counterfactual distributions by minimizing the Euclidean $L_2$ distance between quantile functions. We demonstrate that this geometric reliance renders estimators fragile: they lack informative gradients under support mismatch and produce structural artifacts when outcomes are multimodal. This paper proposes a robust estimator grounded in Optimal Transport (OT). We construct the synthetic control by minimizing the Wasserstein-1 distance between probability measures, implemented via a Wasserstein Generative Adversarial Network (WGAN). We establish the formal point identification of synthetic weights under an affine independence condition on the donor pool. Monte Carlo simulations confirm that while standard estimators exhibit catastrophic variance explosions under heavy-tailed contamination and support mismatch, our WGAN-based approach remains consistent and stable. Furthermore, we show that our measure-based method correctly recovers complex bimodal mixtures where traditional quantile averaging fails structurally.

Abstract PDF Upgrade to Chat

Authors (1)

Xinran Liu

Summary

The paper proposes a WGAN estimator that minimizes the Wasserstein-1 distance to robustly recover counterfactual distributions.
It demonstrates how the new method addresses heavy-tailed contamination, support mismatch, and mixture misspecification compared to traditional L2-based techniques.
Empirical results, including a Kansas tax reform application, confirm that the approach accurately recovers complex, multimodal outcomes in high-dimensional settings.

Robust Counterfactual Distribution Recovery with Wasserstein GANs

Problem Setting and Motivation

Distributional Synthetic Control (DSC) aims to estimate counterfactual outcome distributions for treated units by constructing convex combinations of untreated donor distributions. Existing DSC approaches predominantly minimize the Euclidean ( $L_2$ ) error between quantile functions or CDFs. This methodology fundamentally relies on the assumption of significant geometric overlap between the supports of donor and target distributions and inherits fragility when outcomes are heavy-tailed, affected by contamination, or characterized by non-standard morphologies such as multimodality.

The paper "Recovering Counterfactual Distributions via Wasserstein GANs" (2601.17296) rigorously addresses these deficiencies by proposing an estimator that minimizes the Wasserstein-1 distance (Earth Mover’s Distance) using the dual representation, implemented with a Wasserstein GAN (WGAN) architecture. The estimator yields robustness to support mismatch, heavy-tailed outlier contamination, and recovers complex mixture structures infeasible for quantile-averaging techniques.

Methodological Contributions

Limitations of Traditional $L_2$ -based Methods

The paper formalizes the computational and statistical instability of $L_2$ -based DSC estimators under two principal geometric violations:

Support mismatch: When the target and donor distributions are disjoint, the $L_2$ objective becomes locally flat, producing vanishing gradients. This results in optimizer oscillation and variance explosions in synthetic weights.
Mixture misspecification: Quantile-averaging methods are incompatible with recovering multimodal or mixture distributions from unimodal donors due to the nonlinearity of the quantile operator with respect to mixtures, forcing systematic bias toward unimodal artifacts.

Wasserstein GAN-based Estimator

The proposed method reframes synthetic weight selection as the minimization of the Wasserstein-1 distance between the treated and synthetic (donor-mixture) distributions. Using the Kantorovich-Rubinstein dual form, the objective becomes a differentiable minimax problem:

$\min_{\lambda \in \Delta^J} \max_{\|f\|_{Lip} \leq 1} \mathbb{E}_{x\sim P_{treated}}[f(x)] - \sum_{j=1}^J \lambda_j \mathbb{E}_{x\sim P_j}[f(x)] - \eta H(\lambda)$

where the entropy regularization $H(\lambda)$ enforces unique, stable solutions. The inner maximization is performed by a 1-Lipschitz "critic" neural network, with Lipschitzness enforced via a gradient penalty. The weight updates are performed using entropic mirror descent, accommodating the simplex constraint efficiently.

This dual formulation ensures non-vanishing, informative gradients even for disjoint supports, and the mixture (rather than quantile average) construction enables the recovery of multimodal counterfactuals.

Theoretical Guarantees

The estimator’s theoretical properties are established under a finite-mixture latent factor model and a scaled isometry assumption for the distributional evolution (a generalization of parallel trends to distributions). Specifically:

Identification: Under affine independence of donor distributions and structural stability, the optimal synthetic weights uniquely recover the counterfactual distribution.
Gradient Robustness: In contrast to $L_2$ , Wasserstein gradients with respect to donor weights are always nonzero unless donors are equidistant in the transport metric, enabling persistent optimization signal even under support violation.
Statistical Consistency and Asymptotic Normality: Given sufficient micro-sample size per aggregate unit, the estimator is $\sqrt{N}$ -consistent and asymptotically normal, provided the critic network can approximate 1-Lipschitz functions rapidly enough (compositional smoothness assumption).

Empirical and Simulation Results

Heavy-tailed Contamination

In noise-contaminated DGPs, the WGAN estimator’s weight estimates remain stable and low-bias across contamination up to 4%, whereas the $L_2$ -based estimator shows immediate variance explosions and high RMSE.

Figure 1: Target distributions under increasing heavy-tailed contamination highlight the divergence between ground truth and observed; Wasserstein-1 distance rises predictably with contamination, preserving identifiability of bulk structure.

Figure 2: Average RMSE of synthetic weights; the WGAN-based method is unaffected by contamination, in stark contrast to the immediate breakdown of the $L_2$ estimator.

Support Mismatch (Lack of Overlap)

Variance of $L_2$ synthetic weights diverges as the donor/target support gap increases, while the WGAN-based estimator remains stable, converging smoothly to a uniform prior when the data is uninformative.

Figure 3: Variance of estimated weights as a function of support mismatch parameter $\gamma$ ; only the WGAN-based estimator remains well-behaved even under near-total mismatch.

Figure 4: Mean donor weights under support mismatch; the WGAN solution reallocates uniformly when geometric information is absent, preventing degenerate inferences.

Structural Misspecification (Mixture Recovery)

A central result is the ability to correctly synthesize bimodal or mixture targets from unimodal donors. Quantile-based estimators "average out" the modes, generating a spurious unimodal distribution, whereas the WGAN mixture preserves the true mixture mass locations.

Figure 5: The WGAN mixture estimator reconstructs both modes of a bimodal target, as opposed to the failure of the quantile-average synthesizer.

Multivariate Distributions

Thanks to the metric-based (not sorting-based) construction, the WGAN-DSC generalizes seamlessly to high-dimensional or vector-valued outcomes, which are theoretically inaccessible to quantile-based frameworks.

Figure 6: The WGAN approach recovers multimodal structure in $\mathbb{R}^2$ with accurate location and amplitude, enabling distributional counterfactuals for vector outcomes.

Empirical Application: Kansas Tax Reform

The methodology is applied to the canonical Kansas tax cut event (2012) to recover the counterfactual (no-reform) household income distribution. Pre-intervention fits demonstrate excellent matching. Post-treatment, the observed and counterfactual distributions diverge morphologically—a phenomenon only detectable with distributional (not mean-based) analysis.

Figure 7: Pre-treatment support of treated (Kansas) is fully spanned by donors, validating geometric identification.

Figure 8: WGAN-based synthetic control tracks the treated distribution closely in pre-intervention (placebo) years, capturing full distributional features.

Figure 9: In 2013, the post-event outcome diverges substantially from the synthetic counterfactual ( $W_1 = 0.0877$ ), reflecting morphological change beyond mean shift.

Implications and Directions for Future Work

The Wasserstein GAN synthetic control estimator robustifies counterfactual distribution recovery against contamination, support mismatch, and model misspecification. This expands the applicability of DSC to settings with complex heterogeneity (e.g., polarization, risk tail analysis, and asset return distributions), high-dimensional outcomes, and substantial regime shifts.

Theoretically, the main challenge is extending the weak requirements on the latent generative process to cover arbitrary nonlinear interventions. Computational scalability in ultra-high-dimensional micro-data contexts also motivates ongoing research in neural approximation rates, regularization, and adversarial training stability.

Methodologically, further integration with permutation inference, robustification to complex panel structures, and generalization to dynamic treatments and networked data are promising avenues.

Conclusion

The paper establishes a principled, operational route for robust synthetic control of counterfactual distributions via optimal transport and adversarial optimization. The Wasserstein-1 distance, implemented through a rigorously regularized WGAN, provides decisive improvements in estimator stability, geometric generality, and expressive power compared to traditional $L_2$ methods, with empirical validation spanning contamination resilience, multimodal recovery, and real-world policy intervention analysis (2601.17296).