Formal characterization of error propagation from dependency estimation to TV bound
Characterize how estimation errors in the learned dependency matrix \hat{\mathbf{D}}—relative to the true pairwise-dependency matrix \mathbf{D} defined by expected total-variation influences—affect the total variation distance between the model’s joint conditional distribution P_\theta(Y_S | X, Y_U) and the factorized product distribution Q_\theta(Y_S | X, Y_U) under the DEMASK greedy subset selection algorithm; specifically, derive bounds that translate prediction error in \hat{\mathbf{D}} into degradation of the guarantee on TV(P_\theta(Y_S | X, Y_U), Q_\theta(Y_S | X, Y_U)) that holds when \mathbf{D} is known.
References
Theorem~\ref{thm:correctness} assumes access to the true dependency matrix $\mathbf{D}$, whereas our implementation uses a learned approximation $\hat{\mathbf{D}$. Prediction errors propagate to the TV bound, though we have not formally characterized this relationship.