- The paper introduces a framework to identify latent causal models up to abstraction using interventions on arbitrary latent subsets.
- It relaxes conventional assumptions by leveraging invariance of non‐descendant variables and quotient graph structures to achieve model identification.
- The results offer practical insights for improving causal effect estimation, contrastive learning, and model interpretability in real-world scenarios.
This paper, "On the Identifiability of Causal Abstractions" (2503.10834), addresses the problem of Causal Representation Learning (CRL), specifically focusing on identifying latent causal models when interventions are performed on arbitrary subsets of latent variables, rather than requiring individual interventions on all variables. The core contribution is a theoretical framework that determines the degree to which a causal model can be identified up to an abstraction, given a set of possible interventions. This work relaxes restrictive assumptions of prior research, making the setting more applicable to real-world scenarios where individual interventions are often infeasible.
The authors consider a setting where they have access to contrastive data pairs (x,x~) from an observable space, representing the system before and after a random, unknown intervention on a latent Structural Causal Model (SCM).
1. Data Generating Process:
The process involves:
- A pre-intervention SCM: Latent variables zi are generated by functions fi of their parents PaG(i) and exogenous noise εi, i.e., zi=fi(zPaG(i),εi). The overall pre-intervention latent z is generated by f(ε).
- Interventions: A random variable ι (taking values in the power set of G's vertices) determines which subset of latent variables S⊆V(G) is intervened upon. Interventions are "perfect," meaning the causal mechanisms for nodes in S are severed and replaced by new mechanisms f~i(S)(ε~i). Nodes not in S follow their original mechanisms but with potentially changed parent values.
- Mixing Function: An invertible mixing function g maps latent variables to observables: x=g(z) and x~=g(z~).
The goal is to identify the parameters θ=(θSCM,θintv,g) from the observed data distribution pθ(x,x~).
2. Identifiability up to Equivalence:
The paper defines identifiability such that if two parameter sets θ and θ⋆ produce the same observable distribution, then θ∼θ⋆ for some equivalence relation. Key equivalence relations include:
- Latent Disentanglement (∼L): For a specific latent subspace W⋆⊂Z⋆, the corresponding learned latent w is distributionally equivalent to a transformation of w⋆, i.e., w=dh(w⋆).
- Full Latent Disentanglement (∼FL): All latent components zj are distributionally equivalent to transformations hi of permuted ground truth components zi⋆, i.e., zϕ(i)=dhi(zi⋆) for a permutation ϕ.
- SCM Isomorphism (∼SCM): In addition to full latent disentanglement, the causal graphs G⋆ and G are isomorphic, with the permutation ϕ being a graph isomorphism.
3. Identifiability up to Abstraction:
This concept introduces a partial order ⪯ on causal models, comparing their granularity. θ⋆ is identifiable up to abstraction θ if any θ′ producing the same observable distribution as θ⋆ is "more complex" than or equivalent to θ (i.e., θ⪯θ′).
- Latent Abstraction (⪯L): A latent component w in the abstracted model is distributionally equivalent to a sum of transformations of several latent components (z1′,...,zk′) from the more complex model: w=d∑hi(zi′).
- Full Latent Abstraction (⪯FL): Each latent component zj in the abstracted model is a sum of transformations of a group of latent components {zi′}i∈ϕ−1(j) from the more complex model, where ϕ is a surjection.
- SCM Homomorphism/Abstraction (⪯SCM): There exists a graph epimorphism (surjective homomorphism) ϕ:G′→G and measurable functions hi such that full latent abstraction holds, compatible with the graph mapping. θSCM is an abstraction of θSCM′.
Identifiability Results
The main results rely on three assumptions:
- Faithfulness of the causal graph: The graph G perfectly represents all conditional independencies in p(z).
- Absolute continuity of latent distributions: Exogenous variables and latent variables are in isomorphic vector spaces, causal mechanisms are continuously differentiable, and noise distributions are absolutely continuous.
- Smoothness of mixing function: g is a diffeomorphism.
Theorem 1: Identifiable Model Abstraction
Any latent causal model with true parameters θ⋆ is identifiable up to an SCM abstraction θ whose causal graph G is the quotient graph of the true graph G⋆ with respect to a partition generated by the σ-algebra of the non-descendant sets of intervention targets.
Specifically, G=G⋆/P(σ(nd(I⋆))), where I⋆ is the set of true intervention targets, nd(I⋆) is the family of non-descendant sets for these targets, σ(⋅) is the generated σ-algebra, and P(⋅) is the partition generated by that σ-algebra. Importantly, this quotient graph G is guaranteed to be acyclic.
- Example: If G⋆ has nodes {1,2,3,4,5} and interventions I⋆={{3},{3,4},{4,5}}, then nd(I⋆)={nd({3}),nd({3,4}),nd({4,5})}={{1,2,5},{1,2},{1,2}} (assuming appropriate parent relationships not fully depicted but implied by the example figure). The unique non-descendant sets are {{1,2,5},{1,2}}. The partition P(σ(nd(I⋆))) would then group nodes that are indistinguishable based on these non-descendant sets. For instance, if this partition is {{1,2},{3,4},{5}}, then the identifiable abstracted graph has these blocks as nodes.
Theorem 2: Additional Identifiable Latents
For any non-descendant set N∈nd(I⋆), let π(N) be the intersection of all intervention targets S∈I⋆ for which nd(S)=N. If π(N) is a singleton set {i} (i.e., node i is the unique common element in all intervention targets that leave N as non-descendants) and the latent space Zi⋆ is isomorphic to R, then the latent variable zi⋆ (corresponding to zπ(N)⋆) can be identified up to disentanglement (∼L). This means a specific latent variable can be recovered even if its precise position in the abstracted causal graph isn't fully resolved by Theorem 1.
- Example: With I⋆={{3},{3,4},{4,5}} and nd(I⋆)={{1,2,5},{1,2}}.
- For N={1,2,5}, the intervention targets S with nd(S)=N is just {3}. So, π({1,2,5})={3}. If Z3⋆≅R, then z3⋆ is identifiable up to disentanglement.
- For N={1,2}, the intervention targets S with nd(S)=N are {3,4} and {4,5}. So, π({1,2})={3,4}∩{4,5}={4}. If Z4⋆≅R, then z4⋆ is identifiable up to disentanglement. (The paper's example stated π({1,2})={4,5}, which seems incorrect based on the definition; it should be the intersection).
Proof Outlines
The proofs rely on:
- Finite mixtures: The observed distribution p(z,z~) is a finite mixture, where each component corresponds to an intervention target S∈I. These components can be separated based on equivalence classes defined by their non-descendant sets nd(S).
- Invariance of non-descendant variables: For an intervention S, the latent variables in nd(S) are invariant (znd(S)=z~nd(S)). This allows distinguishing blocks of variables. The paper extends this by showing that every block in P(σ(nd(I⋆))) can be disentangled.
- Independence of interventional targets: Post-intervention latents z~S are statistically independent of pre-intervention latents z, given ι=S. This is used to disentangle zπ(N)⋆ by showing z~π(N) is independent of z conditioned on nd(ι)=N.
Discussion and Implications
- A key insight is that identifying latent variables corresponding to a subset of nodes S⊂V(G⋆) up to abstraction does not require direct intervention on S. It's sufficient if S∈σ(nd(I⋆)).
- This allows learning causal structures up to an abstraction G′ even when the full graph G⋆ is unrecoverable due to a limited set of interventions. Causal mechanisms between aggregated subsets of variables in G′ can be recovered.
- Theorem 2 allows disentangling even more latents than implied by the SCM abstraction of Theorem 1, albeit without fully clarifying their graphical relationships.
Downstream Applications and Limitations
Applications:
- Causal effect estimation with respect to high-level (abstracted) variables.
- Contrastive Learning: Data augmentations can be seen as interventions.
- Temporal Data: Sequential observations can be viewed as counterfactual data from interventions (e.g., system dynamics).
- Causal Interpretability: E.g., interchange intervention training for neural networks, where synthetic counterfactuals are generated.
- Improved generalization, interpretability, and fairness in ML models.
Limitations:
- The work provides theoretical identifiability results, demonstrating what can be learned if an estimator successfully maximizes the likelihood of the observed distribution.
- It does not propose a scalable learning algorithm for achieving this identification in practice. The paper includes a small toy experiment with linear Gaussian models as a proof of concept.
Conclusion
The paper introduces a framework for understanding the identifiability of latent causal models up to a specific level of abstraction when interventions are not necessarily atomic or exhaustive. It shows that even with relaxed assumptions on interventions, a quotient graph structure and additional latent blocks can be identified, determined by the family of intervention targets. This provides a more realistic theoretical foundation for CRL in many practical settings.