Papers
Topics
Authors
Recent
Search
2000 character limit reached

Latent Causal Diffusions for Single-Cell Perturbation Modeling

Published 20 Jan 2026 in q-bio.MN | (2601.15341v1)

Abstract: Perturbation screens hold the potential to systematically map regulatory processes at single-cell resolution, yet modeling and predicting transcriptome-wide responses to perturbations remains a major computational challenge. Existing methods often underperform simple baselines, fail to disentangle measurement noise from biological signal, and provide limited insight into the causal structure governing cellular responses. Here, we present the latent causal diffusion (LCD), a generative model that frames single-cell gene expression as a stationary diffusion process observed under measurement noise. LCD outperforms established approaches in predicting the distributional shifts of unseen perturbation combinations in single-cell RNA-sequencing screens while simultaneously learning a mechanistic dynamical system of gene regulation. To interpret these learned dynamics, we develop an approach we call causal linearization via perturbation responses (CLIPR), which yields an approximation of the direct causal effects between all genes modeled by the diffusion. CLIPR provably identifies causal effects under a linear drift assumption and recovers causal structure in both simulated systems and a genome-wide perturbation screen, where it clusters genes into coherent functional modules and resolves causal relationships that standard differential expression analysis cannot. The LCD-CLIPR framework bridges generative modeling with causal inference to predict unseen perturbation effects and map the underlying regulatory mechanisms of the transcriptome.

Summary

  • The paper presents the latent causal diffusion (LCD) model that predicts transcriptome changes using a stationary stochastic process.
  • It introduces the causal linearization via perturbation responses (CLIPR) method to reliably infer direct causal gene interactions.
  • Benchmarking shows that LCD-CLIPR outperforms conventional and deep learning models in modeling complex, nonlinear perturbation effects.

Analysis of "Latent Causal Diffusions for Single-Cell Perturbation Modeling"

Introduction

The paper "Latent Causal Diffusions for Single-Cell Perturbation Modeling" (2601.15341) introduces a novel approach for modeling and predicting transcriptome-wide responses to genetic perturbations at the single-cell level. The authors address fundamental challenges in disentangling biological signal from noise and uncovering causal structures in cellular responses, proposing the latent causal diffusion (LCD) model. This generative model frames single-cell gene expression as a stationary diffusion process, providing a powerful framework for predicting the effects of unseen perturbations and mapping underlying regulatory mechanisms.

LCD Model Framework

The LCD model conceptualizes gene expression as evolving under a stationary stochastic differential equation (SDE), capturing the inherent stochasticity of single-cell states. By modeling gene regulation through this diffusion process and assuming equilibrium, LCDs can infer functional causal mechanisms. This setup contrasts with previous models that often conflate biological variability with measurement noise or lack interpretability regarding causal structures.

LCD’s key innovation lies in its ability to predict distributional changes due to perturbations using a dynamic, gene-level system. The proposed causal linearization via perturbation responses (CLIPR) methodology further interprets these dynamics by approximating direct causal effects between genes. CLIPR's use of linear drift assumptions enables robust recovery of causal gene interactions, validated across simulated and real genome-wide datasets.

Predictive Performance and Validation

LCDs were benchmarked against existing methods, demonstrating superior performance, particularly in predicting effects of complex perturbation combinations. The model's architecture allows it to effectively capture nonlinear interactions between genes, outperforming heuristic and deep learning models like SALT, PEPER, CPA, and GEARS in terms of both mean and distributional accuracy across diverse genetic interaction types.

The model's ability to handle various types of genetic interactions—ranging from additive to unexpected neomorphic interactions—highlights its flexibility and depth in modeling perturbation responses. By leveraging perturbation-induced shifts in hidden states, LCDs effectively map how these shifts propagate through genetic networks to induce transcriptional changes, a capability sparsely exhibited in existing models.

Causal Inference with CLIPR

CLIPR's capacity to recover causal effects from the learned dynamics of LCDs is a significant advancement. By addressing the limitations of traditional differential expression analysis that often fails to discriminate between direct and indirect genetic influences, CLIPR provides insights into direct regulatory relationships. The method reliably identified causal structures in both synthetic and real-world settings, demonstrating its robustness.

On simulated linear systems, CLIPR accurately recovered underlying causal matrices, with performance improving with the number of perturbed genes. Applied to a genome-wide Perturb-seq dataset, it revealed direct dependencies often masked by conventional analysis, clustering genes into functionally coherent modules and demonstrating the practicality of disentangling causation from correlation in complex gene networks.

Implications and Future Directions

The LCD-CLIPR framework bridges generative modeling with causal inference, offering new vistas for exploring genetic regulatory networks in single-cell biology. Its applications extend beyond predicting perturbation effects, potentially aiding in strategic experimental designs and facilitating discoveries in cell reprogramming and combinatorial drug therapies.

Future research could explore integrating LCDs with additional modalities, such as chromatin accessibility or protein readouts, and refining the inference of state densities. The utility of stationary diffusions in single-cell perturbation studies shows promise for more accurate phenotype predictions and a deeper understanding of cellular mechanisms, fostering advancements in systems biology and therapeutic development.

Conclusion

The proposed LCD-CLIPR framework stands as a compelling tool for advancing single-cell perturbation modeling, underpinned by its robust causal inference capabilities and superior predictive performance. It provides a versatile and interpretable approach to unraveling the complexities of gene regulatory networks, marking a significant contribution to computational biology and the broader domain of genomics.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.