Papers
Topics
Authors
Recent
Search
2000 character limit reached

The Effectiveness of Graph Contrastive Learning on Mathematical Information Retrieval

Published 21 Feb 2024 in cs.IR | (2402.13444v1)

Abstract: This paper details an empirical investigation into using Graph Contrastive Learning (GCL) to generate mathematical equation representations, a critical aspect of Mathematical Information Retrieval (MIR). Our findings reveal that this simple approach consistently exceeds the performance of the current leading formula retrieval model, TangentCFT. To support ongoing research and development in this field, we have made our source code accessible to the public at https://github.com/WangPeiSyuan/GCL-Formula-Retrieval/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Citeseer x: A scholarly big dataset. In Advances in Information Retrieval: 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13-16, 2014. Proceedings 36, pages 311–322. Springer, 2014.
  2. Citeseerx: Ai in a digital library search engine. AI Magazine, 36(3):35–48, 2015.
  3. Csseer: an expert recommendation system based on citeseerx. In Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries, pages 381–382, 2013.
  4. Toward building an academic search engine understanding the purposes of the matched sentences in an abstract. IEEE Access, 9:109344–109354, 2021.
  5. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.
  6. Tie-Yan Liu et al. Learning to rank for information retrieval. Foundations and Trends® in Information Retrieval, 3(3):225–331, 2009.
  7. Ntcir-12 mathir task overview. In NTCIR, 2016.
  8. Tangent-cft: An embedding model for mathematical formulas. In Proceedings of the 2019 ACM SIGIR international conference on theory of information retrieval, pages 11–18, 2019.
  9. Mathbert: A pre-trained model for mathematical formula understanding. arXiv preprint arXiv:2105.00377, 2021.
  10. One blade for one purpose: Advancing math information retrieval using hybrid search. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023.
  11. The art of mathematics retrieval. In Proceedings of the 11th ACM symposium on Document engineering, pages 57–60, 2011.
  12. A document retrieval system for math queries. In NTCIR, 2016.
  13. Preliminary exploration of formula embedding for mathematical information retrieval: can mathematical formulae be embedded like a natural language? arXiv preprint arXiv:1707.05154, 2017.
  14. Distributed representations of sentences and documents. In International conference on machine learning, pages 1188–1196. PMLR, 2014.
  15. An investigation of index formats for the search of mathml objects. In 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops, pages 244–248. IEEE, 2007.
  16. Opmes: A similarity search engine for mathematical content. In Advances in Information Retrieval: 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20–23, 2016. Proceedings 38, pages 849–852. Springer, 2016.
  17. An approach to similarity search for mathematical expressions using mathml. Towards a Digital Mathematics Library. Grand Bend, Ontario, Canada, July 8-9th, 2009, pages 27–35, 2009.
  18. Mcat math retrieval system for ntcir-12 mathir task. In NTCIR, 2016.
  19. Structural similarity search for formulas using leaf-root paths in operator subtrees. In Advances in Information Retrieval: 41st European Conference on IR Research, ECIR 2019, Cologne, Germany, April 14–18, 2019, Proceedings, Part I 41, pages 116–129. Springer, 2019.
  20. Enriching word vectors with subword information. Transactions of the association for computational linguistics, 5:135–146, 2017.
  21. Layout and semantics: Combining representations for mathematical formula search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1165–1168, 2017.
  22. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv preprint arXiv:1908.01000, 2019.
  23. Graph contrastive learning with augmentations. Advances in neural information processing systems, 33:5812–5823, 2020.
  24. Large-scale representation learning on graphs via bootstrapping. arXiv preprint arXiv:2102.06514, 2021.
  25. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
  26. Retrieval evaluation with incomplete information. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pages 25–32, 2004.
  27. Jaana Kekäläinen. Binary and graded relevance in ir evaluations—comparison of the effects on ranking of ir systems. Information processing & management, 41(5):1019–1033, 2005.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.