On the Identifiability of Causal Abstractions

Published 13 Mar 2025 in stat.ML and cs.LG | (2503.10834v1)

Abstract: Causal representation learning (CRL) enhances machine learning models' robustness and generalizability by learning structural causal models associated with data-generating processes. We focus on a family of CRL methods that uses contrastive data pairs in the observable space, generated before and after a random, unknown intervention, to identify the latent causal model. (Brehmer et al., 2022) showed that this is indeed possible, given that all latent variables can be intervened on individually. However, this is a highly restrictive assumption in many systems. In this work, we instead assume interventions on arbitrary subsets of latent variables, which is more realistic. We introduce a theoretical framework that calculates the degree to which we can identify a causal model, given a set of possible interventions, up to an abstraction that describes the system at a higher level of granularity.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a framework to identify latent causal models up to abstraction using interventions on arbitrary latent subsets.
It relaxes conventional assumptions by leveraging invariance of non‐descendant variables and quotient graph structures to achieve model identification.
The results offer practical insights for improving causal effect estimation, contrastive learning, and model interpretability in real-world scenarios.

This paper, "On the Identifiability of Causal Abstractions" (2503.10834), addresses the problem of Causal Representation Learning (CRL), specifically focusing on identifying latent causal models when interventions are performed on arbitrary subsets of latent variables, rather than requiring individual interventions on all variables. The core contribution is a theoretical framework that determines the degree to which a causal model can be identified up to an abstraction, given a set of possible interventions. This work relaxes restrictive assumptions of prior research, making the setting more applicable to real-world scenarios where individual interventions are often infeasible.

The authors consider a setting where they have access to contrastive data pairs $(\mathbf{x}, \tilde{\mathbf{x}})$ from an observable space, representing the system before and after a random, unknown intervention on a latent Structural Causal Model (SCM).

Problem Formulation

1. Data Generating Process:

The process involves:

A pre-intervention SCM: Latent variables $\mathbf{z}_i$ are generated by functions $f_i$ of their parents $Pa_{\mathcal{G}(i)}$ and exogenous noise $\bm{\varepsilon}_i$ , i.e., $\mathbf{z}_i = f_i ( \mathbf{z}_{Pa_{\mathcal{G}(i)}}, \bm \varepsilon_i )$ . The overall pre-intervention latent $\mathbf{z}$ is generated by $\mathbf{f}(\bm{\varepsilon})$ .
Interventions: A random variable $\bm{\iota}$ (taking values in the power set of $\mathcal{G}$ 's vertices) determines which subset of latent variables $S \subseteq V(\mathcal{G})$ is intervened upon. Interventions are "perfect," meaning the causal mechanisms for nodes in $S$ are severed and replaced by new mechanisms $\tilde{f}^{(S)}_i(\tilde{\bm{\varepsilon}}_i)$ . Nodes not in $S$ follow their original mechanisms but with potentially changed parent values.
Mixing Function: An invertible mixing function $g$ maps latent variables to observables: $\mathbf{x} = g(\mathbf{z})$ and $\tilde{\mathbf{x}} = g(\tilde{\mathbf{z}})$ . The goal is to identify the parameters $\theta = (\theta_{\text{SCM}}, \theta_{\text{intv}}, g)$ from the observed data distribution $p_\theta(\mathbf{x}, \tilde{\mathbf{x}})$ .

2. Identifiability up to Equivalence:

The paper defines identifiability such that if two parameter sets $\theta$ and $\theta^\star$ produce the same observable distribution, then $\theta \sim \theta^\star$ for some equivalence relation. Key equivalence relations include:

Latent Disentanglement ( $\sim_L$ ): For a specific latent subspace $\mathcal{W}^\star \subset \mathcal{Z}^\star$ , the corresponding learned latent $\mathbf{w}$ is distributionally equivalent to a transformation of $\mathbf{w}^\star$ , i.e., $\mathbf{w} \overset{d}{=} h(\mathbf{w}^\star)$ .
Full Latent Disentanglement ( $\sim_{FL}$ ): All latent components $\mathbf{z}_j$ are distributionally equivalent to transformations $h_i$ of permuted ground truth components $\mathbf{z}^\star_i$ , i.e., $\mathbf{z}_{\phi(i)} \overset{d}{=} h_{i}(\mathbf{z}^\star_{i})$ for a permutation $\phi$ .
SCM Isomorphism ( $\sim_{SCM}$ ): In addition to full latent disentanglement, the causal graphs $\mathcal{G}^\star$ and $\mathcal{G}$ are isomorphic, with the permutation $\phi$ being a graph isomorphism.

3. Identifiability up to Abstraction:

This concept introduces a partial order $\preceq$ on causal models, comparing their granularity. $\theta^\star$ is identifiable up to abstraction $\theta$ if any $\theta'$ producing the same observable distribution as $\theta^\star$ is "more complex" than or equivalent to $\theta$ (i.e., $\theta \preceq \theta'$ ).

Latent Abstraction ( $\preceq_L$ ): A latent component $\mathbf{w}$ in the abstracted model is distributionally equivalent to a sum of transformations of several latent components $(\mathbf{z}'_1, ..., \mathbf{z}'_k)$ from the more complex model: $\mathbf{w} \overset{d}{=} \sum h_i(\mathbf{z}'_i)$ .
Full Latent Abstraction ( $\preceq_{FL}$ ): Each latent component $\mathbf{z}_j$ in the abstracted model is a sum of transformations of a group of latent components $\{\mathbf{z}'_i\}_{i \in \phi^{-1}(j)}$ from the more complex model, where $\phi$ is a surjection.
SCM Homomorphism/Abstraction ( $\preceq_{SCM}$ ): There exists a graph epimorphism (surjective homomorphism) $\phi: \mathcal{G}' \to \mathcal{G}$ and measurable functions $h_i$ such that full latent abstraction holds, compatible with the graph mapping. $\theta_{\text{SCM}}$ is an abstraction of $\theta'_{\text{SCM}}$ .

Identifiability Results

The main results rely on three assumptions:

Faithfulness of the causal graph: The graph $\mathcal{G}$ perfectly represents all conditional independencies in $p(\mathbf{z})$ .
Absolute continuity of latent distributions: Exogenous variables and latent variables are in isomorphic vector spaces, causal mechanisms are continuously differentiable, and noise distributions are absolutely continuous.
Smoothness of mixing function: $g$ is a diffeomorphism.

Theorem 1: Identifiable Model Abstraction

Any latent causal model with true parameters $\theta^\star$ is identifiable up to an SCM abstraction $\theta$ whose causal graph $\mathcal{G}$ is the quotient graph of the true graph $\mathcal{G}^\star$ with respect to a partition generated by the $\sigma$ -algebra of the non-descendant sets of intervention targets. Specifically, $\mathcal{G} = \mathcal{G}^\star / \mathcal{P}(\sigma(\mathbf{nd}(\mathcal{I}^\star)))$ , where $\mathcal{I}^\star$ is the set of true intervention targets, $\mathbf{nd}(\mathcal{I}^\star)$ is the family of non-descendant sets for these targets, $\sigma(\cdot)$ is the generated $\sigma$ -algebra, and $\mathcal{P}(\cdot)$ is the partition generated by that $\sigma$ -algebra. Importantly, this quotient graph $\mathcal{G}$ is guaranteed to be acyclic.

Example: If $\mathcal{G}^\star$ has nodes $\{1, 2, 3, 4, 5\}$ and interventions $\mathcal{I}^\star = \{\{3\}, \{3, 4\}, \{4, 5\}\}$ , then $\mathbf{nd}(\mathcal{I}^\star) = \{\text{nd}(\{3\}), \text{nd}(\{3,4\}), \text{nd}(\{4,5\})\} = \{\{1, 2, 5\}, \{1, 2\}, \{1, 2\}\}$ (assuming appropriate parent relationships not fully depicted but implied by the example figure). The unique non-descendant sets are $\{\{1,2,5\}, \{1,2\}\}$ . The partition $\mathcal{P}(\sigma(\mathbf{nd}(\mathcal{I}^\star)))$ would then group nodes that are indistinguishable based on these non-descendant sets. For instance, if this partition is $\{\{1, 2\}, \{3, 4\}, \{5\}\}$ , then the identifiable abstracted graph has these blocks as nodes.

Theorem 2: Additional Identifiable Latents

For any non-descendant set $N \in \mathbf{nd}(\mathcal{I}^\star)$ , let $\pi(N)$ be the intersection of all intervention targets $S \in \mathcal{I}^\star$ for which $\text{nd}(S) = N$ . If $\pi(N)$ is a singleton set $\{i\}$ (i.e., node $i$ is the unique common element in all intervention targets that leave $N$ as non-descendants) and the latent space $\mathcal{Z}^\star_i$ is isomorphic to $\mathbb{R}$ , then the latent variable $\mathbf{z}^\star_i$ (corresponding to $\mathbf{z}^\star_{\pi(N)}$ ) can be identified up to disentanglement ( $\sim_L$ ). This means a specific latent variable can be recovered even if its precise position in the abstracted causal graph isn't fully resolved by Theorem 1.

Example: With $\mathcal{I}^\star = \{\{3\}, \{3, 4\}, \{4, 5\}\}$ $I^{⋆} = {{3}, {3, 4}, {4, 5}}$ and $\mathbf{nd}(\mathcal{I}^\star) = \{\{1, 2, 5\}, \{1, 2\}\}$ .
- For $N = \{1, 2, 5\}$ , the intervention targets $S$ with $\text{nd}(S)=N$ is just $\{3\}$ . So, $\pi(\{1, 2, 5\}) = \{3\}$ . If $\mathcal{Z}^\star_3 \cong \mathbb{R}$ , then $\mathbf{z}^\star_3$ is identifiable up to disentanglement.
- For $N = \{1, 2\}$ , the intervention targets $S$ with $\text{nd}(S)=N$ are $\{3,4\}$ and $\{4,5\}$ . So, $\pi(\{1, 2\}) = \{3,4\} \cap \{4,5\} = \{4\}$ . If $\mathcal{Z}^\star_4 \cong \mathbb{R}$ , then $\mathbf{z}^\star_4$ is identifiable up to disentanglement. (The paper's example stated $\pi(\{1,2\}) = \{4,5\}$ , which seems incorrect based on the definition; it should be the intersection).

Proof Outlines

The proofs rely on:

Finite mixtures: The observed distribution $p(\mathbf{z}, \tilde{\mathbf{z}})$ is a finite mixture, where each component corresponds to an intervention target $S \in \mathcal{I}$ . These components can be separated based on equivalence classes defined by their non-descendant sets $\text{nd}(S)$ .
Invariance of non-descendant variables: For an intervention $S$ , the latent variables in $\text{nd}(S)$ are invariant ( $\mathbf{z}_{\text{nd}(S)} = \tilde{\mathbf{z}}_{\text{nd}(S)}$ ). This allows distinguishing blocks of variables. The paper extends this by showing that every block in $\mathcal{P}(\sigma(\mathbf{nd}(\mathcal{I}^\star)))$ can be disentangled.
Independence of interventional targets: Post-intervention latents $\tilde{\mathbf{z}}_S$ are statistically independent of pre-intervention latents $\mathbf{z}$ , given $\bm{\iota}=S$ . This is used to disentangle $\mathbf{z}^\star_{\pi(N)}$ by showing $\tilde{\mathbf{z}}_{\pi(N)}$ is independent of $\mathbf{z}$ conditioned on $\text{nd}(\bm{\iota}) = N$ .

Discussion and Implications

A key insight is that identifying latent variables corresponding to a subset of nodes $S \subset V(\mathcal{G}^\star)$ up to abstraction does not require direct intervention on $S$ . It's sufficient if $S \in \sigma(\mathbf{nd}(\mathcal{I}^\star))$ .
This allows learning causal structures up to an abstraction $\mathcal{G}'$ even when the full graph $\mathcal{G}^\star$ is unrecoverable due to a limited set of interventions. Causal mechanisms between aggregated subsets of variables in $\mathcal{G}'$ can be recovered.
Theorem 2 allows disentangling even more latents than implied by the SCM abstraction of Theorem 1, albeit without fully clarifying their graphical relationships.

Downstream Applications and Limitations

Applications:

Causal effect estimation with respect to high-level (abstracted) variables.
Contrastive Learning: Data augmentations can be seen as interventions.
Temporal Data: Sequential observations can be viewed as counterfactual data from interventions (e.g., system dynamics).
Causal Interpretability: E.g., interchange intervention training for neural networks, where synthetic counterfactuals are generated.
Improved generalization, interpretability, and fairness in ML models.

Limitations:

The work provides theoretical identifiability results, demonstrating what can be learned if an estimator successfully maximizes the likelihood of the observed distribution.
It does not propose a scalable learning algorithm for achieving this identification in practice. The paper includes a small toy experiment with linear Gaussian models as a proof of concept.

Conclusion

The paper introduces a framework for understanding the identifiability of latent causal models up to a specific level of abstraction when interventions are not necessarily atomic or exhaustive. It shows that even with relaxed assumptions on interventions, a quotient graph structure and additional latent blocks can be identified, determined by the family of intervention targets. This provides a more realistic theoretical foundation for CRL in many practical settings.