UNADON: Transformer-based model to predict genome-wide chromosome spatial position
Abstract: The spatial positioning of chromosomes relative to functional nuclear bodies is intertwined with genome functions such as transcription. However, the sequence patterns and epigenomic features that collectively influence chromatin spatial positioning in a genome-wide manner are not well understood. Here, we develop a new transformer-based deep learning model called UNADON, which predicts the genome-wide cytological distance to a specific type of nuclear body, as measured by TSA-seq, using both sequence features and epigenomic signals. Evaluations of UNADON in four cell lines (K562, H1, HFFc6, HCT116) show high accuracy in predicting chromatin spatial positioning to nuclear bodies when trained on a single cell line. UNADON also performed well in an unseen cell type. Importantly, we reveal potential sequence and epigenomic factors that affect large-scale chromatin compartmentalization to nuclear bodies. Together, UNADON provides new insights into the principles between sequence features and large-scale chromatin spatial localization, which has important implications for understanding nuclear structure and function.
- Spector, D. L. Nuclear domains. Journal of Cell Science 114, 2891–2893 (2001).
- Biogenesis of nuclear bodies. Cold Spring Harbor Perspectives in Biology 2, a000711 (2010).
- Genome organization around nuclear speckles. Current Opinion in Genetics & Development 55, 91–99 (2019).
- Lamina-associated domains: links with chromosome architecture, heterochromatin, and gene repression. Cell 169, 780–791 (2017).
- Zhang, L. et al. TSA-seq reveals a largely conserved genome organization relative to nuclear speckles with small position changes tightly correlated with gene expression changes. Genome Research 31, 251–264 (2021).
- Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
- Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
- Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
- Jin, F. et al. A high-resolution map of the three-dimensional chromatin interactome in human cells. Nature 503, 290–294 (2013).
- Chen, Y. et al. Mapping 3D genome organization relative to nuclear compartments using TSA-seq as a cytological ruler. Journal of Cell Biology 217, 4025–4048 (2018).
- Boninsegna, L. et al. Integrative genome modeling platform reveals essentiality of rare contact events in 3d genome organizations. Nature Methods 19, 938–949 (2022).
- Deep learning: new computational modelling techniques for genomics. Nature Reviews Genetics 20, 389–403 (2019).
- Machine learning methods for exploring sequence determinants of 3D genome organization. Journal of Molecular Biology 167666 (2022).
- Predicting 3D genome folding from DNA sequence with Akita. Nature Methods 17, 1111–1117 (2020).
- Lamina-associated domains: peripheral matters and internal affairs. Genome Biology 21, 1–25 (2020).
- Zhou, J. Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nature Genetics 54, 725–734 (2022).
- Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems, 5998–6008 (2017).
- Dai, Z. et al. Transformer-XL: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860 (2019).
- Varuna Jayasiri, N. W. labml.ai annotated paper implementations (2020). URL https://nn.labml.ai/.
- Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning, 1180–1189 (PMLR, 2015).
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).
- Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794 (2016).
- Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods 12, 931–934 (2015).
- Pedregosa, F. et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011).
- Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019).
- A pitfall for machine learning methods aiming to predict across cell types. Genome Biology 21, 1–6 (2020).
- Axiomatic attribution for deep networks. In International Conference on Machine Learning, 3319–3328 (PMLR, 2017).
- Kokhlikyan, N. et al. Captum: A unified and generic model interpretability library for pytorch (2020). 2009.07896.
- Dekker, J. et al. The 4D nucleome project. Nature 549, 219–226 (2017).
- Consortium, E. P. et al. An integrated encyclopedia of dna elements in the human genome. Nature 489, 57 (2012).
- Janssens, D. H. et al. Automated in situ chromatin profiling efficiently resolves cell types and gene regulatory programs. Epigenetics & Chromatin 11, 1–14 (2018).
- Alexander, K. A. et al. p53 mediates target gene association with nuclear speckles for amplified rna expression. Molecular Cell 81, 1666–1681 (2021).
- Wang, Y. et al. SPIN reveals genome-wide landscape of nuclear compartmentalization. Genome Biology 22, 1–23 (2021).
- Harr, J. C. et al. Directed targeting of chromatin to the nuclear lamina is mediated by chromatin state and A-type lamins. Journal of Cell Biology 208, 33–52 (2015).
- Moore, J. E. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
- Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale chip-seq analysis. Nucleic Acids Research 46, D252–D259 (2018).
- FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).
- Meuleman, W. et al. Constitutive nuclear lamina–genome interactions are highly conserved and associated with A/T-rich sequence. Genome Research 23, 270–280 (2013).
- Belmont, A. S. Nuclear compartments: an incomplete primer to nuclear compartments, bodies, and genome organization relative to nuclear architecture. Cold Spring Harbor Perspectives in Biology 14, a041268 (2022).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.