Papers
Topics
Authors
Recent
Search
2000 character limit reached

Chaining thoughts and LLMs to learn DNA structural biophysics

Published 2 Mar 2024 in q-bio.QM, cs.AI, and cs.LG | (2403.01332v1)

Abstract: The future development of an AI scientist, a tool that is capable of integrating a variety of experimental data and generating testable hypotheses, holds immense potential. So far, bespoke machine learning models have been created to specialize in singular scientific tasks, but otherwise lack the flexibility of a general purpose model. Here, we show that a general purpose LLM, chatGPT 3.5-turbo, can be fine-tuned to learn the structural biophysics of DNA. We find that both fine-tuning models to return chain-of-thought responses and chaining together models fine-tuned for subtasks have an enhanced ability to analyze and design DNA sequences and their structures.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Machine learning in the search for new fundamental physics. Nature Reviews Physics, 4(6):399–412, 2022. doi: 10.1038/s42254-022-00455-1.
  2. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583+, 2021. ISSN 0028-0836. doi: 10.1038/s41586-021-03819-2.
  3. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023. doi: 10.1126/science.ade2574.
  4. Single-sequence protein structure prediction using a language model and deep learning. Nature Biotechnology, 40(11):1617+, 2022. ISSN 1087-0156. doi: 10.1038/s41587-022-01432-w.
  5. BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic Acids Research, 49(22):e129–e129, 09 2021. ISSN 0305-1048. doi: 10.1093/nar/gkab829.
  6. Nadrian C. Seeman. Nucleic acid junctions and lattices. Journal of Theoretical Biology, 99(2):237–247, 1982. ISSN 0022-5193. doi: https://doi.org/10.1016/0022-5193(82)90002-9.
  7. PWK Rothemund. Folding dna to create nanoscale shapes and patterns. Nature, 440(7082):297–302, 2006. ISSN 0028-0836. doi: 10.1038/nature04586.
  8. Three-dimensional structures self-assembled from dna bricks. Science, 338(6111):1177–1183, 2012. doi: 10.1126/science.1227268.
  9. Enzyme-free nucleic acid logic circuits. Science, 314(5805):1585–1588, 2006. doi: 10.1126/science.1132493.
  10. NUPACK: Analysis and design of nucleic acid systems. Journal of Computational Chemistry, 32(1):170–173, 2011. doi: https://doi.org/10.1002/jcc.21596.
  11. NUPACK: Analysis and Design of Nucleic Acid Structures, Devices, and Systems. ChemRxiv, 2022. doi: 10.26434/chemrxiv-2022-xv98l.
  12. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  13. Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991. ISSN 0899-7667. doi: 10.1162/neco.1991.3.1.79.
  14. Mathematical capabilities of chatgpt. Advances in Neural Information Processing Systems, 36, 2024.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.