Papers
Topics
Authors
Recent
Search
2000 character limit reached

On the Identifiability of Causal Abstractions

Published 13 Mar 2025 in stat.ML and cs.LG | (2503.10834v1)

Abstract: Causal representation learning (CRL) enhances machine learning models' robustness and generalizability by learning structural causal models associated with data-generating processes. We focus on a family of CRL methods that uses contrastive data pairs in the observable space, generated before and after a random, unknown intervention, to identify the latent causal model. (Brehmer et al., 2022) showed that this is indeed possible, given that all latent variables can be intervened on individually. However, this is a highly restrictive assumption in many systems. In this work, we instead assume interventions on arbitrary subsets of latent variables, which is more realistic. We introduce a theoretical framework that calculates the degree to which we can identify a causal model, given a set of possible interventions, up to an abstraction that describes the system at a higher level of granularity.

Summary

  • The paper introduces a framework to identify latent causal models up to abstraction using interventions on arbitrary latent subsets.
  • It relaxes conventional assumptions by leveraging invariance of non‐descendant variables and quotient graph structures to achieve model identification.
  • The results offer practical insights for improving causal effect estimation, contrastive learning, and model interpretability in real-world scenarios.

This paper, "On the Identifiability of Causal Abstractions" (2503.10834), addresses the problem of Causal Representation Learning (CRL), specifically focusing on identifying latent causal models when interventions are performed on arbitrary subsets of latent variables, rather than requiring individual interventions on all variables. The core contribution is a theoretical framework that determines the degree to which a causal model can be identified up to an abstraction, given a set of possible interventions. This work relaxes restrictive assumptions of prior research, making the setting more applicable to real-world scenarios where individual interventions are often infeasible.

The authors consider a setting where they have access to contrastive data pairs (x,x~)(\mathbf{x}, \tilde{\mathbf{x}}) from an observable space, representing the system before and after a random, unknown intervention on a latent Structural Causal Model (SCM).

Problem Formulation

1. Data Generating Process:

The process involves:

  • A pre-intervention SCM: Latent variables zi\mathbf{z}_i are generated by functions fif_i of their parents PaG(i)Pa_{\mathcal{G}(i)} and exogenous noise εi\bm{\varepsilon}_i, i.e., zi=fi(zPaG(i),εi)\mathbf{z}_i = f_i ( \mathbf{z}_{Pa_{\mathcal{G}(i)}}, \bm \varepsilon_i ). The overall pre-intervention latent z\mathbf{z} is generated by f(ε)\mathbf{f}(\bm{\varepsilon}).
  • Interventions: A random variable ι\bm{\iota} (taking values in the power set of G\mathcal{G}'s vertices) determines which subset of latent variables SV(G)S \subseteq V(\mathcal{G}) is intervened upon. Interventions are "perfect," meaning the causal mechanisms for nodes in SS are severed and replaced by new mechanisms f~i(S)(ε~i)\tilde{f}^{(S)}_i(\tilde{\bm{\varepsilon}}_i). Nodes not in SS follow their original mechanisms but with potentially changed parent values.
  • Mixing Function: An invertible mixing function gg maps latent variables to observables: x=g(z)\mathbf{x} = g(\mathbf{z}) and x~=g(z~)\tilde{\mathbf{x}} = g(\tilde{\mathbf{z}}). The goal is to identify the parameters θ=(θSCM,θintv,g)\theta = (\theta_{\text{SCM}}, \theta_{\text{intv}}, g) from the observed data distribution pθ(x,x~)p_\theta(\mathbf{x}, \tilde{\mathbf{x}}).

2. Identifiability up to Equivalence:

The paper defines identifiability such that if two parameter sets θ\theta and θ\theta^\star produce the same observable distribution, then θθ\theta \sim \theta^\star for some equivalence relation. Key equivalence relations include:

  • Latent Disentanglement (L\sim_L): For a specific latent subspace WZ\mathcal{W}^\star \subset \mathcal{Z}^\star, the corresponding learned latent w\mathbf{w} is distributionally equivalent to a transformation of w\mathbf{w}^\star, i.e., w=dh(w)\mathbf{w} \overset{d}{=} h(\mathbf{w}^\star).
  • Full Latent Disentanglement (FL\sim_{FL}): All latent components zj\mathbf{z}_j are distributionally equivalent to transformations hih_i of permuted ground truth components zi\mathbf{z}^\star_i, i.e., zϕ(i)=dhi(zi)\mathbf{z}_{\phi(i)} \overset{d}{=} h_{i}(\mathbf{z}^\star_{i}) for a permutation ϕ\phi.
  • SCM Isomorphism (SCM\sim_{SCM}): In addition to full latent disentanglement, the causal graphs G\mathcal{G}^\star and G\mathcal{G} are isomorphic, with the permutation ϕ\phi being a graph isomorphism.

3. Identifiability up to Abstraction:

This concept introduces a partial order \preceq on causal models, comparing their granularity. θ\theta^\star is identifiable up to abstraction θ\theta if any θ\theta' producing the same observable distribution as θ\theta^\star is "more complex" than or equivalent to θ\theta (i.e., θθ\theta \preceq \theta').

  • Latent Abstraction (L\preceq_L): A latent component w\mathbf{w} in the abstracted model is distributionally equivalent to a sum of transformations of several latent components (z1,...,zk)(\mathbf{z}'_1, ..., \mathbf{z}'_k) from the more complex model: w=dhi(zi)\mathbf{w} \overset{d}{=} \sum h_i(\mathbf{z}'_i).
  • Full Latent Abstraction (FL\preceq_{FL}): Each latent component zj\mathbf{z}_j in the abstracted model is a sum of transformations of a group of latent components {zi}iϕ1(j)\{\mathbf{z}'_i\}_{i \in \phi^{-1}(j)} from the more complex model, where ϕ\phi is a surjection.
  • SCM Homomorphism/Abstraction (SCM\preceq_{SCM}): There exists a graph epimorphism (surjective homomorphism) ϕ:GG\phi: \mathcal{G}' \to \mathcal{G} and measurable functions hih_i such that full latent abstraction holds, compatible with the graph mapping. θSCM\theta_{\text{SCM}} is an abstraction of θSCM\theta'_{\text{SCM}}.

Identifiability Results

The main results rely on three assumptions:

  1. Faithfulness of the causal graph: The graph G\mathcal{G} perfectly represents all conditional independencies in p(z)p(\mathbf{z}).
  2. Absolute continuity of latent distributions: Exogenous variables and latent variables are in isomorphic vector spaces, causal mechanisms are continuously differentiable, and noise distributions are absolutely continuous.
  3. Smoothness of mixing function: gg is a diffeomorphism.

Theorem 1: Identifiable Model Abstraction

Any latent causal model with true parameters θ\theta^\star is identifiable up to an SCM abstraction θ\theta whose causal graph G\mathcal{G} is the quotient graph of the true graph G\mathcal{G}^\star with respect to a partition generated by the σ\sigma-algebra of the non-descendant sets of intervention targets. Specifically, G=G/P(σ(nd(I)))\mathcal{G} = \mathcal{G}^\star / \mathcal{P}(\sigma(\mathbf{nd}(\mathcal{I}^\star))), where I\mathcal{I}^\star is the set of true intervention targets, nd(I)\mathbf{nd}(\mathcal{I}^\star) is the family of non-descendant sets for these targets, σ()\sigma(\cdot) is the generated σ\sigma-algebra, and P()\mathcal{P}(\cdot) is the partition generated by that σ\sigma-algebra. Importantly, this quotient graph G\mathcal{G} is guaranteed to be acyclic.

  • Example: If G\mathcal{G}^\star has nodes {1,2,3,4,5}\{1, 2, 3, 4, 5\} and interventions I={{3},{3,4},{4,5}}\mathcal{I}^\star = \{\{3\}, \{3, 4\}, \{4, 5\}\}, then nd(I)={nd({3}),nd({3,4}),nd({4,5})}={{1,2,5},{1,2},{1,2}}\mathbf{nd}(\mathcal{I}^\star) = \{\text{nd}(\{3\}), \text{nd}(\{3,4\}), \text{nd}(\{4,5\})\} = \{\{1, 2, 5\}, \{1, 2\}, \{1, 2\}\} (assuming appropriate parent relationships not fully depicted but implied by the example figure). The unique non-descendant sets are {{1,2,5},{1,2}}\{\{1,2,5\}, \{1,2\}\}. The partition P(σ(nd(I)))\mathcal{P}(\sigma(\mathbf{nd}(\mathcal{I}^\star))) would then group nodes that are indistinguishable based on these non-descendant sets. For instance, if this partition is {{1,2},{3,4},{5}}\{\{1, 2\}, \{3, 4\}, \{5\}\}, then the identifiable abstracted graph has these blocks as nodes.

Theorem 2: Additional Identifiable Latents

For any non-descendant set Nnd(I)N \in \mathbf{nd}(\mathcal{I}^\star), let π(N)\pi(N) be the intersection of all intervention targets SIS \in \mathcal{I}^\star for which nd(S)=N\text{nd}(S) = N. If π(N)\pi(N) is a singleton set {i}\{i\} (i.e., node ii is the unique common element in all intervention targets that leave NN as non-descendants) and the latent space Zi\mathcal{Z}^\star_i is isomorphic to R\mathbb{R}, then the latent variable zi\mathbf{z}^\star_i (corresponding to zπ(N)\mathbf{z}^\star_{\pi(N)}) can be identified up to disentanglement (L\sim_L). This means a specific latent variable can be recovered even if its precise position in the abstracted causal graph isn't fully resolved by Theorem 1.

  • Example: With I={{3},{3,4},{4,5}}\mathcal{I}^\star = \{\{3\}, \{3, 4\}, \{4, 5\}\} and nd(I)={{1,2,5},{1,2}}\mathbf{nd}(\mathcal{I}^\star) = \{\{1, 2, 5\}, \{1, 2\}\}.
    • For N={1,2,5}N = \{1, 2, 5\}, the intervention targets SS with nd(S)=N\text{nd}(S)=N is just {3}\{3\}. So, π({1,2,5})={3}\pi(\{1, 2, 5\}) = \{3\}. If Z3R\mathcal{Z}^\star_3 \cong \mathbb{R}, then z3\mathbf{z}^\star_3 is identifiable up to disentanglement.
    • For N={1,2}N = \{1, 2\}, the intervention targets SS with nd(S)=N\text{nd}(S)=N are {3,4}\{3,4\} and {4,5}\{4,5\}. So, π({1,2})={3,4}{4,5}={4}\pi(\{1, 2\}) = \{3,4\} \cap \{4,5\} = \{4\}. If Z4R\mathcal{Z}^\star_4 \cong \mathbb{R}, then z4\mathbf{z}^\star_4 is identifiable up to disentanglement. (The paper's example stated π({1,2})={4,5}\pi(\{1,2\}) = \{4,5\}, which seems incorrect based on the definition; it should be the intersection).

Proof Outlines

The proofs rely on:

  1. Finite mixtures: The observed distribution p(z,z~)p(\mathbf{z}, \tilde{\mathbf{z}}) is a finite mixture, where each component corresponds to an intervention target SIS \in \mathcal{I}. These components can be separated based on equivalence classes defined by their non-descendant sets nd(S)\text{nd}(S).
  2. Invariance of non-descendant variables: For an intervention SS, the latent variables in nd(S)\text{nd}(S) are invariant (znd(S)=z~nd(S)\mathbf{z}_{\text{nd}(S)} = \tilde{\mathbf{z}}_{\text{nd}(S)}). This allows distinguishing blocks of variables. The paper extends this by showing that every block in P(σ(nd(I)))\mathcal{P}(\sigma(\mathbf{nd}(\mathcal{I}^\star))) can be disentangled.
  3. Independence of interventional targets: Post-intervention latents z~S\tilde{\mathbf{z}}_S are statistically independent of pre-intervention latents z\mathbf{z}, given ι=S\bm{\iota}=S. This is used to disentangle zπ(N)\mathbf{z}^\star_{\pi(N)} by showing z~π(N)\tilde{\mathbf{z}}_{\pi(N)} is independent of z\mathbf{z} conditioned on nd(ι)=N\text{nd}(\bm{\iota}) = N.

Discussion and Implications

  • A key insight is that identifying latent variables corresponding to a subset of nodes SV(G)S \subset V(\mathcal{G}^\star) up to abstraction does not require direct intervention on SS. It's sufficient if Sσ(nd(I))S \in \sigma(\mathbf{nd}(\mathcal{I}^\star)).
  • This allows learning causal structures up to an abstraction G\mathcal{G}' even when the full graph G\mathcal{G}^\star is unrecoverable due to a limited set of interventions. Causal mechanisms between aggregated subsets of variables in G\mathcal{G}' can be recovered.
  • Theorem 2 allows disentangling even more latents than implied by the SCM abstraction of Theorem 1, albeit without fully clarifying their graphical relationships.

Downstream Applications and Limitations

Applications:

  • Causal effect estimation with respect to high-level (abstracted) variables.
  • Contrastive Learning: Data augmentations can be seen as interventions.
  • Temporal Data: Sequential observations can be viewed as counterfactual data from interventions (e.g., system dynamics).
  • Causal Interpretability: E.g., interchange intervention training for neural networks, where synthetic counterfactuals are generated.
  • Improved generalization, interpretability, and fairness in ML models.

Limitations:

  • The work provides theoretical identifiability results, demonstrating what can be learned if an estimator successfully maximizes the likelihood of the observed distribution.
  • It does not propose a scalable learning algorithm for achieving this identification in practice. The paper includes a small toy experiment with linear Gaussian models as a proof of concept.

Conclusion

The paper introduces a framework for understanding the identifiability of latent causal models up to a specific level of abstraction when interventions are not necessarily atomic or exhaustive. It shows that even with relaxed assumptions on interventions, a quotient graph structure and additional latent blocks can be identified, determined by the family of intervention targets. This provides a more realistic theoretical foundation for CRL in many practical settings.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 24 likes about this paper.