Papers
Topics
Authors
Recent
Search
2000 character limit reached

Typhon: Automatic Recommendation of Relevant Code Cells in Jupyter Notebooks

Published 15 May 2024 in cs.SE | (2405.09075v1)

Abstract: At present, code recommendation tools have gained greater importance to many software developers in various areas of expertise. Having code recommendation tools has enabled better productivity and performance in developing the code in software and made it easier for developers to find code examples and learn from them. This paper proposes Typhon, an approach to automatically recommend relevant code cells in Jupyter notebooks. Typhon tokenizes developers' markdown description cells and looks for the most similar code cells from the database using text similarities such as the BM25 ranking function or CodeBERT, a machine-learning approach. Then, the algorithm computes the similarity distance between the tokenized query and markdown cells to return the most relevant code cells to the developers. We evaluated the Typhon tool on Jupyter notebooks from Kaggle competitions and found that the approach can recommend code cells with moderate accuracy. The approach and results in this paper can lead to further improvements in code cell recommendations in Jupyter notebooks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. S. Luan, D. Yang, C. Barnaby, K. Sen, and S. Chandra, “Aroma: Code recommendation via structural code search,” Proceedings of the ACM on Programming Languages, vol. 3, 10 2019.
  2. F. Silavong, S. Moran, A. Georgiadis, R. Saphal, and R. Otter, “Senatus - a fast and accurate code-to-code recommendation engine,” in MSR ’22, 2022, pp. 511–523.
  3. R. Holmes, R. J. Walker, and G. C. Murphy, “Strathcona example recommendation tool,” in ESEC/FSE ’13, 2005, pp. 237–240.
  4. A. Zagalsky, O. Barzilay, and A. Yehudai, “Example overflow: Using social media for code recommendation,” in RSSE ’12, 2012, pp. 38–42.
  5. AI Terms, “What is TabNine?” https://aiterms.net/tabnine/, online; accessed 4 November 2022.
  6. T. Kluyver, B. Ragan-Kelley, F. Pérez, B. Granger, M. Bussonnier, J. Frederic, K. Kelley, J. Hamrick, J. Grout, S. Corlay, P. Ivanov, D. Avila, S. Abdalla, and C. Willing, “Jupyter notebooks—a publishing format for reproducible computational workflows,” in ELPUB ’16, 2016, pp. 87–90.
  7. L. Quaranta, F. Calefato, and F. Lanubile, “Kgtorrent: A dataset of python jupyter notebooks from kaggle,” in MSR ’21, 2021, pp. 550–554.
  8. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “CodeBERT: A Pre-Trained Model for Programming and Natural Languages,” EMNLP ’20, pp. 1536–1547, Feb 2020.
  9. D. Guo, S. Lu, N. Duan, Y. Wang, M. Zhou, and J. Yin, “Unixcoder: Unified cross-modal pre-training for code representation,” arXiv, 3 2022. [Online]. Available: http://arxiv.org/abs/2203.03850
  10. N. Ritta, T. Settewong, R. G. Kula, C. Ragkhitwetsagul, T. Sunetnanta, and K. Matsumoto, “Reusing My Own Code: Preliminary Results for Competitive Coding in Jupyter Notebooks,” in APSEC ’22, 2022.
  11. H. Husain, H.-H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt, “CodeSearchNet Challenge: Evaluating the State of Semantic Code Search,” arXiv, 9 2019. [Online]. Available: http://arxiv.org/abs/1909.09436

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.