Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code?
Abstract: This paper discusses the limitations of evaluating Masked LLMs (MLMs) in code completion tasks. We highlight that relying on accuracy-based measurements may lead to an overestimation of models' capabilities by neglecting the syntax rules of programming languages. To address these issues, we introduce a technique called SyntaxEval in which Syntactic Capabilities are used to enhance the evaluation of MLMs. SyntaxEval automates the process of masking elements in the model input based on their Abstract Syntax Trees (ASTs). We conducted a case study on two popular MLMs using data from GitHub repositories. Our results showed negative causal effects between the node types and MLMs' accuracy. We conclude that MLMs under study fail to predict some syntactic capabilities.
- 2023. WM-SEMERU/SyntaxEval. https://github.com/WM-SEMERU/SyntaxEval original-date: 2022-09-09T20:53:59Z.
- Towards Understanding What Code Language Models Learned. https://doi.org/10.48550/arXiv.2306.11943 arXiv:2306.11943 [cs].
- Vaishak Belle and Ioannis Papantonis. 2020. Principles and Practice of Explainable Machine Learning. CoRR abs/2009.11698 (2020). arXiv:2009.11698 https://arxiv.org/abs/2009.11698
- Learning from examples to improve code completion systems. In Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering (2009-08-24). ACM, 213–222. https://doi.org/10.1145/1595696.1595728
- SEQUENCER: Sequence-to-Sequence Learning for End-to-End Program Repair. IEEE Transactions on Software Engineering (2019), 1–1. https://doi.org/10.1109/TSE.2019.2940179
- An Empirical Study on the Usage of Transformer Models for Code Completion. ([n. d.]), 1–1. https://doi.org/10.1109/TSE.2021.3128234
- An Empirical Study on the Usage of Transformer Models for Code Completion. arXiv:cs.SE/2108.01585
- An Empirical Study on the Usage of BERT Models for Code Completion. CoRR abs/2103.07115 (2021). arXiv:2103.07115 https://arxiv.org/abs/2103.07115
- PyMT5: multi-mode translation of natural language and Python code with transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020). Association for Computational Linguistics, 9052–9065. https://doi.org/10.18653/v1/2020.emnlp-main.728
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://doi.org/10.48550/arXiv.1810.04805 arXiv:1810.04805 [cs]
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages. arXiv:cs.CL/2002.08155
- Code Completion from Abbreviated Input. In 2009 IEEE/ACM International Conference on Automated Software Engineering (2009-11). IEEE, 332–343. https://doi.org/10.1109/ASE.2009.64
- Code completion of multiple keywords from abbreviated input. 18, 3 ([n. d.]), 363–398. https://doi.org/10.1007/s10515-011-0083-2
- AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Oct. 2022), 1–11. https://doi.org/10.1145/3551349.3556900 Conference Name: ASE ’22: 37th IEEE/ACM International Conference on Automated Software Engineering ISBN: 9781450394758 Place: Rochester MI USA Publisher: ACM.
- On the naturalness of software. In 2012 34th International Conference on Software Engineering (ICSE). 837–847. https://doi.org/10.1109/ICSE.2012.6227135
- Large Language Models for Software Engineering: A Systematic Literature Review. http://arxiv.org/abs/2308.10620 arXiv:2308.10620 [cs].
- CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. arXiv:1909.09436 [cs, stat] (Sept. 2019). http://arxiv.org/abs/1909.09436 arXiv: 1909.09436.
- Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction. arXiv:cs.CL/2005.00987
- Anjan Karmakar and Romain Robbes. 2023. INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers. IEEE Transactions on Software Engineering (2023), 1–19. https://doi.org/10.1109/TSE.2023.3341624 Conference Name: IEEE Transactions on Software Engineering.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv:cs.CL/1907.11692
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 arXiv:1907.11692 [cs].
- Are Code Pre-trained Models Powerful to Learn Code Syntax and Semantics? https://doi.org/10.48550/arXiv.2212.10017 arXiv:2212.10017 [cs].
- Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. (2021), 336–347. https://doi.org/10.1109/icse43902.2021.00041
- Ahmad Haji Mohammadkhani and Hadi Hemmati. [n. d.]. Explainable AI for Pre-Trained Code Models: What Do They Learn? When They Do Not Work? ([n. d.]).
- Toward a Theory of Causation for Interpreting Neural Code Models. https://doi.org/10.48550/arXiv.2302.03788 arXiv:2302.03788 [cs, stat]
- Judea Pearl. 2009. Causality: models, reasoning, and inference.
- Towards Demystifying Dimensions of Source Code Embeddings. ([n. d.]), 29–38. ISBN: 9781450381253.
- Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (2014-06-09). ACM, 419–428. https://doi.org/10.1145/2594291.2594321
- Benchmarking Causal Study to Interpret Large Language Models for Source Code. In 2023 IEEE International Conference on Software Maintenance and Evolution (ICSME). 329–334. https://doi.org/10.1109/ICSME58846.2023.00040
- DoWhy : Addressing Challenges in Expressing and Validating Causal Assumptions. (2021).
- P. K. Srimani and S. F. B. Nasir. 2007. A Textbook on Automata Theory. Foundation Books. https://doi.org/10.1017/UPO9788175968363
- IntelliCode compose: code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2020-11-08). ACM, 1433–1443. https://doi.org/10.1145/3368089.3417058
- Pythia: AI-assisted Code Completion System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019-07-25). 2727–2735. https://doi.org/10.1145/3292500.3330699 arXiv:1912.00742 [cs]
- Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Codes. Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (2022), 371–383. https://doi.org/10.18653/v1/2022.blackboxnlp-1.31 Conference Name: Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP Place: Abu Dhabi, United Arab Emirates (Hybrid) Publisher: Association for Computational Linguistics.
- Deep Learning Similarities from Different Representations of Source Code. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR). 542–553.
- Towards Automating Code Review Activities. In 43rd International Conference on Software Engineering, ICSE’21. https://arxiv.org/abs/2101.02518
- What Do They Capture? – A Structural Analysis of Pre-Trained Language Models for Source Code. https://doi.org/10.48550/arXiv.2202.06840 arXiv:2202.06840 [cs]
- A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research. ACM Transactions on Software Engineering and Methodology 31, 2 (March 2022), 32:1–32:58. https://doi.org/10.1145/3485275
- On Learning Meaningful Assert Statements for Unit Test Cases. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 1398–1409. https://doi.org/10.1145/3377811.3380429
- Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. In 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER). 479–490. https://doi.org/10.1109/SANER.2019.8668043
- Deep learning code fragments for code clone detection. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). 87–98.
- Toward Deep Learning Software Repositories. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories (2015-05). IEEE, 334–345. https://doi.org/10.1109/MSR.2015.38
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.