Papers
Topics
Authors
Recent
Search
2000 character limit reached

Single-Sequence-Based Protein Secondary Structure Prediction using One-Hot and Chemical Encodings of Amino Acids

Published 6 Jul 2024 in q-bio.BM | (2407.05173v1)

Abstract: In protein secondary structure prediction, each amino acid in sequence is typically treated as a distinct category and represented by a one-hot vector. In this study, we developed two novel chemical representations for amino acids utilizing molecular fingerprints and the dimensionality reduction algorithm FastMap. We demonstrate that the two new chemical encodings can provide additional information about the interactions of amino acids in sequences that an LSTM-based model cannot capture with one-hot encoding alone. Compared to the latest LSTM-based model used in the single-sequence-based method SPOT-1D-Single, our ensemble model utilizing one-hot and chemical encodings achieves better accuracy across most test sets while requiring approximately nine times fewer trainable parameters for each encoding model. Our single-sequence-based method is valuable for its simplicity, lower resource requirements, and independence from external sequence data. It is beneficial when quick or preliminary predictions are needed or when data on homologous sequences is scarce.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.