Nonparametric Identification and Inference for Counterfactual Distributions with Confounding

Published 17 Feb 2026 in stat.ME and stat.ML | (2602.15916v1)

Abstract: We propose nonparametric identification and semiparametric estimation of joint potential outcome distributions in the presence of confounding. First, in settings with observed confounding, we derive tighter, covariate-informed bounds on the joint distribution by leveraging conditional copulas. To overcome the non-differentiability of bounding min/max operators, we establish the asymptotic properties for both a direct estimator with polynomial margin condition and a smooth approximation with log-sum-exp operator, facilitating valid inference for individual-level effects under the canonical rank-preserving assumption. Second, we tackle the challenge of unmeasured confounding by introducing a causal representation learning framework. By utilizing instrumental variables, we prove the nonparametric identifiability of the latent confounding subspace under injectivity and completeness conditions. We develop a ``triple machine learning" estimator that employs cross-fitting scheme to sequentially handle the learned representation, nuisance parameters, and target functional. We characterize the asymptotic distribution with variance inflation induced by representation learning error, and provide conditions for semiparametric efficiency. We also propose a practical VAE-based algorithm for confounding representation learning. Simulations and real-world analysis validate the effectiveness of proposed methods. By bridging classical semiparametric theory with modern representation learning, this work provides a robust statistical foundation for distributional and counterfactual inference in complex causal systems.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a novel framework for identifying joint potential outcome distributions by using conditional copulas and causal representation learning to address both observed and unobserved confounding.
The methodology features a 'triple machine learning' estimator and a VAE-based algorithm that leverages the Hilbert-Schmidt Independence Criterion to ensure effective recovery of latent exogenous variables.
Simulation studies and real-world empirical results validate the approach, demonstrating robust estimation and improved causal inference in complex observational settings.

Nonparametric Identification and Inference for Counterfactual Distributions with Confounding

Introduction

This paper introduces a novel framework for nonparametric identification and semiparametric estimation of joint potential outcome distributions in the presence of confounding. Traditional causal inference methods often face challenges due to the presence of confounders, whether observed or unmeasured, which can obscure causal relationships. The authors present two main contributions: addressing confounding with observed covariates using conditional copulas and handling unmeasured confounding through a causal representation learning approach that leverages instrumental variables (IVs).

Conditional Copulas for Observed Confounders

In scenarios where all confounders are observed, the identification of counterfactual outcomes is straightforward, but the joint distribution remains elusive without specific assumptions. The authors employ conditional copulas to derive tighter bounds on joint potential outcome distributions, exploiting the Frechet-Hoeffding bounds adjusted for observed covariates.

Figure 1: Bound width gained via marginal copulas and conditional copulas.

By utilizing conditional copulas, they effectively utilize information contained within the covariate distribution to sharpen estimates of joint distributions, moving beyond the classical marginal bounds.

Representation Learning with Instrumental Variables

When confronted with unmeasured confounding, the authors propose a causal representation learning framework. This innovative approach employs instrumental variables to uncover latent confounding structures. By establishing the nonparametric identifiability of this latent confounding space, the authors facilitate the identification of marginal potential outcome distributions, moving from a local to a global treatment effect perspective.

This framework is operationalized through a "triple machine learning" estimator, which extends traditional double machine learning techniques by incorporating additional cross-fitting stages to accommodate representation learning. The effectiveness of this approach is validated through simulations, demonstrating accurate recovery of causal parameters under challenging scenarios of unmeasured confounding.

Practical Implications and Algorithmic Innovations

The practical utility of the proposed methods is highlighted by their application to real-world scenarios. The authors introduce a Variational Autoencoder (VAE)-based algorithm for learning confounding representations, emphasizing the role of the Hilbert-Schmidt Independence Criterion (HSIC) in ensuring that the recovered latent variables are exogenous to the instruments. This innovation allows for robust estimation in the presence of latent confounders and broadens the application scope of causal inference methods.

Figure 2: Causal analysis of cigarette demand. (a) Estimated Average Dose-Response Function with 95% pointwise confidence intervals.

Simulation and Empirical Results

The paper provides comprehensive simulations and an empirical study illustrating the robustness and flexibility of the proposed methods. These results underscore the practical relevance of integrating modern representation learning techniques with classical causal inference frameworks. The simulation studies are particularly compelling, illustrating both the accuracy of confounding representation learning and the efficacy of the proposed estimators in diverse scenarios.

Conclusion

This study successfully bridges classical semiparametric theory with modern representation learning, providing a robust statistical framework for counterfactual inference in complex causal systems. The authors highlight a significant step forward in causal inference, particularly in dealing with unmeasured confounding and capturing the full distribution of potential outcomes. The proposed methods hold substantial promise for advancing both theoretical developments and practical applications in causal analysis. The integration of representation learning and causal inference methodologies offers a powerful toolkit for researchers grappling with the pervasive challenge of confounding in observational studies.