Papers
Topics
Authors
Recent
Search
2000 character limit reached

Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code?

Published 3 Jan 2024 in cs.SE | (2401.01512v2)

Abstract: This paper discusses the limitations of evaluating Masked LLMs (MLMs) in code completion tasks. We highlight that relying on accuracy-based measurements may lead to an overestimation of models' capabilities by neglecting the syntax rules of programming languages. To address these issues, we introduce a technique called SyntaxEval in which Syntactic Capabilities are used to enhance the evaluation of MLMs. SyntaxEval automates the process of masking elements in the model input based on their Abstract Syntax Trees (ASTs). We conducted a case study on two popular MLMs using data from GitHub repositories. Our results showed negative causal effects between the node types and MLMs' accuracy. We conclude that MLMs under study fail to predict some syntactic capabilities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. 2023. WM-SEMERU/SyntaxEval. https://github.com/WM-SEMERU/SyntaxEval original-date: 2022-09-09T20:53:59Z.
  2. Towards Understanding What Code Language Models Learned. https://doi.org/10.48550/arXiv.2306.11943 arXiv:2306.11943 [cs].
  3. Vaishak Belle and Ioannis Papantonis. 2020. Principles and Practice of Explainable Machine Learning. CoRR abs/2009.11698 (2020). arXiv:2009.11698 https://arxiv.org/abs/2009.11698
  4. Learning from examples to improve code completion systems. In Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering (2009-08-24). ACM, 213–222. https://doi.org/10.1145/1595696.1595728
  5. SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair. IEEE Transactions on Software Engineering (2019), 1–1. https://doi.org/10.1109/TSE.2019.2940179
  6. An Empirical Study on the Usage of Transformer Models for Code Completion. ([n. d.]), 1–1. https://doi.org/10.1109/TSE.2021.3128234
  7. An Empirical Study on the Usage of Transformer Models for Code Completion. arXiv:cs.SE/2108.01585
  8. An Empirical Study on the Usage of BERT Models for Code Completion. CoRR abs/2103.07115 (2021). arXiv:2103.07115 https://arxiv.org/abs/2103.07115
  9. PyMT5: multi-mode translation of natural language and Python code with transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020). Association for Computational Linguistics, 9052–9065. https://doi.org/10.18653/v1/2020.emnlp-main.728
  10. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805 arXiv:1810.04805 [cs]
  11. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:cs.CL/2002.08155
  12. Code Completion from Abbreviated Input. In 2009 IEEE/ACM International Conference on Automated Software Engineering (2009-11). IEEE, 332–343. https://doi.org/10.1109/ASE.2009.64
  13. Code completion of multiple keywords from abbreviated input. 18, 3 ([n. d.]), 363–398. https://doi.org/10.1007/s10515-011-0083-2
  14. AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Oct. 2022), 1–11. https://doi.org/10.1145/3551349.3556900 Conference Name: ASE ’22: 37th IEEE/ACM International Conference on Automated Software Engineering ISBN: 9781450394758 Place: Rochester MI USA Publisher: ACM.
  15. On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). 837–847. https://doi.org/10.1109/ICSE.2012.6227135
  16. Large Language Models for Software Engineering: A Systematic Literature Review. http://arxiv.org/abs/2308.10620 arXiv:2308.10620 [cs].
  17. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv:1909.09436 [cs, stat] (Sept. 2019). http://arxiv.org/abs/1909.09436 arXiv: 1909.09436.
  18. Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction. arXiv:cs.CL/2005.00987
  19. Anjan Karmakar and Romain Robbes. 2023. INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers. IEEE Transactions on Software Engineering (2023), 1–19. https://doi.org/10.1109/TSE.2023.3341624 Conference Name: IEEE Transactions on Software Engineering.
  20. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:cs.CL/1907.11692
  21. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 arXiv:1907.11692 [cs].
  22. Are Code Pre-trained Models Powerful to Learn Code Syntax and Semantics? https://doi.org/10.48550/arXiv.2212.10017 arXiv:2212.10017 [cs].
  23. Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. (2021), 336–347. https://doi.org/10.1109/icse43902.2021.00041
  24. Ahmad Haji Mohammadkhani and Hadi Hemmati. [n. d.]. Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work? ([n. d.]).
  25. Toward a Theory of Causation for Interpreting Neural Code Models. https://doi.org/10.48550/arXiv.2302.03788 arXiv:2302.03788 [cs, stat]
  26. Judea Pearl. 2009. Causality: models, reasoning, and inference.
  27. Towards Demystifying Dimensions of Source Code Embeddings. ([n. d.]), 29–38. ISBN: 9781450381253.
  28. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (2014-06-09). ACM, 419–428. https://doi.org/10.1145/2594291.2594321
  29. Benchmarking Causal Study to Interpret Large Language Models for Source Code. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). 329–334. https://doi.org/10.1109/ICSME58846.2023.00040
  30. DoWhy : Addressing Challenges in Expressing and Validating Causal Assumptions. (2021).
  31. P. K. Srimani and S. F. B. Nasir. 2007. A Textbook on Automata Theory. Foundation Books. https://doi.org/10.1017/UPO9788175968363
  32. IntelliCode compose: code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2020-11-08). ACM, 1433–1443. https://doi.org/10.1145/3368089.3417058
  33. Pythia: AI-assisted Code Completion System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019-07-25). 2727–2735. https://doi.org/10.1145/3292500.3330699 arXiv:1912.00742 [cs]
  34. Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Codes. Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (2022), 371–383. https://doi.org/10.18653/v1/2022.blackboxnlp-1.31 Conference Name: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP Place: Abu Dhabi, United Arab Emirates (Hybrid) Publisher: Association for Computational Linguistics.
  35. Deep Learning Similarities from Different Representations of Source Code. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). 542–553.
  36. Towards Automating Code Review Activities. In 43rd International Conference on Software Engineering, ICSE’21. https://arxiv.org/abs/2101.02518
  37. What Do They Capture? – A Structural Analysis of Pre-Trained Language Models for Source Code. https://doi.org/10.48550/arXiv.2202.06840 arXiv:2202.06840 [cs]
  38. A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. ACM Transactions on Software Engineering and Methodology 31, 2 (March 2022), 32:1–32:58. https://doi.org/10.1145/3485275
  39. On Learning Meaningful Assert Statements for Unit Test Cases. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1398–1409. https://doi.org/10.1145/3377811.3380429
  40. Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 479–490. https://doi.org/10.1109/SANER.2019.8668043
  41. Deep learning code fragments for code clone detection. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). 87–98.
  42. Toward Deep Learning Software Repositories. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (2015-05). IEEE, 334–345. https://doi.org/10.1109/MSR.2015.38
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.