MNAR Covariate Shifts in Missing Data

Updated 1 February 2026

MNAR covariate shifts are defined by missing mechanisms that depend on unobserved outcomes, invalidating standard MAR assumptions.
They require advanced imputation, robust estimation, and domain adaptation techniques since conventional weighting and learning methods fail.
Empirical studies in healthcare, biology, and sensor networks demonstrate MNAR's profound effects on predictive accuracy and causal inference.

Missing-Not-At-Random (MNAR) covariate shifts describe environments where the probability of missingness for covariates is itself informative and, crucially, changes between source and target domains or regimes. In MNAR, the missingness probability depends on unobserved (missing) values or the outcome variable, so standard assumptions like Missing at Random (MAR) do not hold. This structure induces complex distributional shifts not just in observed variable distributions, but also in the underlying mechanisms that generate missingness, which may invalidate the use of classic imputation, learning, and domain adaptation strategies. MNAR covariate shifts have emerged as a central challenge in robust prediction, matrix completion, causal inference, and domain adaptation, with heightened relevance in healthcare, biological sciences, and sensor networks.

1. Formal Characterization of MNAR Covariate Shifts

MNAR missingness mechanisms are defined by laws where the missingness indicator $M$ depends on unobserved variables or outcomes, even after conditioning on observed covariates. Formally, for covariates $X\in\mathbb R^d$ , missing-value indicators $M\in\{0,1\}^d$ , and outcome $Y$ , missingness is MNAR if:

$p(M \mid X_\text{obs}, X_\text{mis}, Y) \neq p(M \mid X_\text{obs}, Y)$

Conversely, a missingness shift is the situation where the conditional probability of missingness differs between source ( $p$ ) and target ( $q$ ) domains:

$\exists m: p(M=m \mid X) \neq q(M=m \mid X)$

The shift is classified as non-ignorable when either $p$ or $q$ are MNAR (i.e., depend on $X_\text{mis}$ or $Y$ ), meaning the Bayes predictor trained in the source is, in general, suboptimal in the target (Rockenschaub et al., 2024).

MNAR covariate shifts can arise in blockwise missingness (e.g., in matrix completion, entire rows or columns missing in biological or sensor data (Jalan et al., 28 Feb 2025)), or from mechanisms dependent on unobserved latent variables, as in time series with state-dependent blackouts (Sunesh et al., 4 Jan 2026).

2. Statistical and Algorithmic Consequences

The non-ignorable nature of MNAR leads to a fundamental change in identifiability, risk minimization, and estimator consistency:

Bayes Predictor Variability: For ignorable shifts (MAR), the pattern-specific Bayes predictor $E[Y\mid X_\text{obs}, M]$ remains unchanged under shift if $(X,Y)$ law is invariant. In contrast, under MNAR, the conditional law of $X_\text{mis}$ given $X_\text{obs}, M$ alters, so the Bayes predictor itself shifts and cannot generally be recovered using only the source missingness model (Rockenschaub et al., 2024).
Domain Adaptation Breakdown: Standard covariate-shift corrections (e.g., importance weighting $w(x)=p^\text{target}(x)/p^\text{source}(x)$ ) fail for MNAR, as observed conditional laws $p^\text{source}(Y|X^*,R)$ and $p^\text{target}(Y|X^*,R)$ do not correspond via any weighting transformation. Instead, domain adaptation reduces to a two-step problem: (i) consistent MNAR imputation in each domain; (ii) standard importance-weighting on the completed data (Stokes et al., 1 Apr 2025).
Imputation and Estimation: Under mild continuity and graphical identifiability conditions (e.g., MNAR mDAGs, criss-cross structures), one can use flexible parametric models, joint Bayesian frameworks, or identifiable variational autoencoders (e.g., GINA), but imputation error dominates overall excess risk (Stokes et al., 1 Apr 2025).
Matrix Completion: In blockwise MNAR (entire rows/columns missing), active querying (G-optimal design) allows minimax-optimal recovery under linear latent-space feature shift, while passive sampling requires incoherence (Jalan et al., 28 Feb 2025).

3. Identification Theory and Sensitivity Analysis

Identification in MNAR regimes is delicate:

Partial and Parametric Identification: Nonparametrically, the joint law is not identified under general MNAR; only certain conditional distributions, such as $p(X|Y)$ in criss-cross models, are nonparametrically identified. Parametric identifiability is possible using exponential family structures and contrast equations, provided sufficient support and rank conditions hold (Guo et al., 2023).
Odds-Ratio Parameterization: MNAR degrees are encoded via odds-ratio models. Semiparametric estimators (order-statistic pseudo-likelihood, generalized estimating equations) deliver $\sqrt{N}$ -consistent, asymptotically normal estimation of MNAR parameters; some are doubly robust (valid under model misspecification for either missingness propensity) (Guo et al., 2023).
Assumption-Free Bounding: In causal inference (outcome-independent MNAR), tight bounds on treatment effects are available via direct probabilistic decomposition. Sensitivity analysis with analyst-specified constraints enables refined bounds for causal contrasts, trading off interval width against robustness (Peña, 2024).
Robust Minimax Quantile Analysis: In mean estimation, robust quantile decompositions yield sharp minimax quantiles: the risk splits into an MCAR term and an MNAR robust term proportional to the contamination $\epsilon$ . The realisable contamination class enables consistency even at high MNAR rates, leveraging the structural restriction that MNAR contamination preserves the base distribution (Ma et al., 2024).

4. Methods for Robust Prediction and Inference

A range of algorithmic models have been developed for MNAR covariate shifts:

Method	MNAR Mechanism	Statistical Guarantees
Bayesian Joint	Arbitrary, linear, or blocked MNAR	Consistent within parametric family (Stokes et al., 1 Apr 2025)
GINA VAE	High-dimensional, MNAR d-graphs	Identifiable, flexible, empirically optimal (Stokes et al., 1 Apr 2025)
MVPC (PC-based Causal Discovery)	Graphical MNAR structure	Asymptotically correct under m-graphs (Tu et al., 2018)
Robust Descent (mean estimation)	Mixture MCAR/MNAR contamination	Minimax optimal quantiles, adapts to $\epsilon$ (Ma et al., 2024)
Active Sampling (Matrix Completion)	Blockwise MNAR, latent-shift	Minimax optimal, avoids incoherence (Jalan et al., 28 Feb 2025)
Latent State-Space (Time Series)	Dropout driven by unobserved state	Modest but consistent RMSE reduction over MAR (Sunesh et al., 4 Jan 2026)

Algorithm selection depends on model structure, dimensionality, and the degree of missingness informativeness.

5. Empirical Performance and Case Studies

Recent empirical evaluations have elucidated the practical implications of MNAR covariate shifts:

Healthcare/EHR Adaptation: In domain adaptation for ICU mortality prediction, GINA VAE achieves superior Brier and AUROC scores (Brier $0.035$, AUROC $0.87$) relative to mean imputation and traditional MICE approaches, precisely due to its MNAR identifiability properties. MNAR-capable imputers coupled with flexible weighting deliver consistent target-domain predictions (Stokes et al., 1 Apr 2025).
Matrix Completion in Biology: Active sampling leveraging spectral features from the source matrix P enables near-optimal completion of target Q with blockwise MNAR patterns. On gene-expression and metabolic datasets, active methods outperform passive and prior baselines even under extreme coherence (Jalan et al., 28 Feb 2025).
Causal Discovery: MVPC recovers correct partial ancestral graphs in synthetic and real medical applications with MNAR, outperforming test-wise deletion or classic PC algorithms. Corrections via permutation or density-ratio weighting ensure valid conditional independences (Tu et al., 2018).
Time Series Blackouts: Modeling dropout as MNAR via a latent state-space Bernoulli channel yields a consistent (albeit modest) RMSE reduction. The benefit increases when missingness is highly dependent on latent traffic state (Sunesh et al., 4 Jan 2026).

6. Theoretical Limitations and Practical Guidance

In MNAR settings, unless the new missingness law is estimable, recovering the Bayes predictor for target is impossible; estimator selection must balance the exploitation of missingness informativeness against risk of overfitting to obsolete masking mechanisms.
MNAR-robust estimators (e.g., NeuMISE, robust mean-based procedures, MVPC corrections) offer principled trade-offs (Rockenschaub et al., 2024).
Inclusion of outcome variable in imputation may reduce bias in the source but creates deployment mismatch, effectively introducing an MNAR shift.
Analytical and simulation evidence recommends favoring methods that de-emphasize missingness signals when missingness informativeness may dissolve between domains (Rockenschaub et al., 2024).
For high-dimensional linear or nonlinear MNAR graphs, flexible MNAR-recognizing autoencoders or Bayesian joint models are advisable; off-the-shelf approaches (MICE, mean imputation) can suffice when MNAR is weak or dimensionality is low (Stokes et al., 1 Apr 2025).

7. Extensions and Future Directions

Expansion of identification theory for MNAR in high-dimensional, nonlinear models remains an open frontier.
Active experimental design for MNAR matrix completion (e.g., G-optimal row/column selection) suggests further development of adaptive querying methodologies under latent-shift MNAR (Jalan et al., 28 Feb 2025).
Generalization of latent drop-out modeling to nonlinear and graph-structured state-space regimes would extend MNAR handling to broader sequenced measurement domains (Sunesh et al., 4 Jan 2026).
Integration of robust estimation theory (minimax quantile decompositions, contamination models) with MNAR-motivated sensitivity analysis can guide practitioners in domains where missingness itself is a signal for domain adaptation.

In summary, MNAR covariate shifts represent a rigorously defined class of distributional shift scenarios where both missingness mechanism and covariate law may move, often in a way not recoverable by traditional MAR or MCAR-based inference and adaptation. Approaches grounded in MNAR identifiability, robust estimation, and imputation integrated with flexible domain adaptation machinery form the current state of research. These techniques enable practitioners to maintain model performance and generalizability where missingness is non-ignorable and subject to shift (Rockenschaub et al., 2024, Stokes et al., 1 Apr 2025, Jalan et al., 28 Feb 2025, Guo et al., 2023, Peña, 2024, Ma et al., 2024, Sunesh et al., 4 Jan 2026).