Causal inference of post-transcriptional regulation timelines from long-read sequencing in Arabidopsis thaliana
Abstract: We propose a novel framework for reconstructing the chronology of genetic regulation using causal inference based on Pearl's theory. The approach proceeds in three main stages: causal discovery, causal inference, and chronology construction. We apply it to the ndhB and ndhD genes of the chloroplast in Arabidopsis thaliana, generating four alternative maturation timeline models per gene, each derived from a different causal discovery algorithm (HC, PC, LiNGAM, or NOTEARS). Two methodological challenges are addressed: the presence of missing data, handled via an EM algorithm that jointly imputes missing values and estimates the Bayesian network, and the selection of the $\ell_1$-regularization parameter in NOTEARS, for which we introduce a stability selection strategy. The resulting causal models consistently outperform reference chronologies in terms of both reliability and model fit. Moreover, by combining causal reasoning with domain expertise, the framework enables the formulation of testable hypotheses and the design of targeted experimental interventions grounded in theoretical predictions.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.