Papers
Topics
Authors
Recent
Search
2000 character limit reached

GraphBPE: Molecular Graphs Meet Byte-Pair Encoding

Published 26 Jul 2024 in cs.LG, cs.AI, physics.chem-ph, and q-bio.BM | (2407.19039v1)

Abstract: With the increasing attention to molecular machine learning, various innovations have been made in designing better models or proposing more comprehensive benchmarks. However, less is studied on the data preprocessing schedule for molecular graphs, where a different view of the molecular graph could potentially boost the model's performance. Inspired by the Byte-Pair Encoding (BPE) algorithm, a subword tokenization method popularly adopted in Natural Language Processing, we propose GraphBPE, which tokenizes a molecular graph into different substructures and acts as a preprocessing schedule independent of the model architectures. Our experiments on 3 graph-level classification and 3 graph-level regression datasets show that data preprocessing could boost the performance of models for molecular graphs, and GraphBPE is effective for small classification datasets and it performs on par with other tokenization methods across different model architectures.

Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.