Chaining thoughts and LLMs to learn DNA structural biophysics
Abstract: The future development of an AI scientist, a tool that is capable of integrating a variety of experimental data and generating testable hypotheses, holds immense potential. So far, bespoke machine learning models have been created to specialize in singular scientific tasks, but otherwise lack the flexibility of a general purpose model. Here, we show that a general purpose LLM, chatGPT 3.5-turbo, can be fine-tuned to learn the structural biophysics of DNA. We find that both fine-tuning models to return chain-of-thought responses and chaining together models fine-tuned for subtasks have an enhanced ability to analyze and design DNA sequences and their structures.
- Machine learning in the search for new fundamental physics. Nature Reviews Physics, 4(6):399–412, 2022. doi: 10.1038/s42254-022-00455-1.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583+, 2021. ISSN 0028-0836. doi: 10.1038/s41586-021-03819-2.
- Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023. doi: 10.1126/science.ade2574.
- Single-sequence protein structure prediction using a language model and deep learning. Nature Biotechnology, 40(11):1617+, 2022. ISSN 1087-0156. doi: 10.1038/s41587-022-01432-w.
- BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic Acids Research, 49(22):e129–e129, 09 2021. ISSN 0305-1048. doi: 10.1093/nar/gkab829.
- Nadrian C. Seeman. Nucleic acid junctions and lattices. Journal of Theoretical Biology, 99(2):237–247, 1982. ISSN 0022-5193. doi: https://doi.org/10.1016/0022-5193(82)90002-9.
- PWK Rothemund. Folding dna to create nanoscale shapes and patterns. Nature, 440(7082):297–302, 2006. ISSN 0028-0836. doi: 10.1038/nature04586.
- Three-dimensional structures self-assembled from dna bricks. Science, 338(6111):1177–1183, 2012. doi: 10.1126/science.1227268.
- Enzyme-free nucleic acid logic circuits. Science, 314(5805):1585–1588, 2006. doi: 10.1126/science.1132493.
- NUPACK: Analysis and design of nucleic acid systems. Journal of Computational Chemistry, 32(1):170–173, 2011. doi: https://doi.org/10.1002/jcc.21596.
- NUPACK: Analysis and Design of Nucleic Acid Structures, Devices, and Systems. ChemRxiv, 2022. doi: 10.26434/chemrxiv-2022-xv98l.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991. ISSN 0899-7667. doi: 10.1162/neco.1991.3.1.79.
- Mathematical capabilities of chatgpt. Advances in Neural Information Processing Systems, 36, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.