ChangeGuard: Validating Code Changes via Pairwise Learning-Guided Execution
Abstract: Code changes are an integral part of the software development process. Many code changes are meant to improve the code without changing its functional behavior, e.g., refactorings and performance improvements. Unfortunately, validating whether a code change preserves the behavior is non-trivial, particularly when the code change is performed deep inside a complex project. This paper presents ChangeGuard, an approach that uses learning-guided execution to compare the runtime behavior of a modified function. The approach is enabled by the novel concept of pairwise learning-guided execution and by a set of techniques that improve the robustness and coverage of the state-of-the-art learning-guided execution technique. Our evaluation applies ChangeGuard to a dataset of 224 manually annotated code changes from popular Python open-source projects and to three datasets of code changes obtained by applying automated code transformations. Our results show that the approach identifies semantics-changing code changes with a precision of 77.1% and a recall of 69.5%, and that it detects unexpected behavioral changes introduced by automatic code refactoring tools. In contrast, the existing regression tests of the analyzed projects miss the vast majority of semantics-changing code changes, with a recall of only 7.6%. We envision our approach being useful for detecting unintended behavioral changes early in the development process and for improving the quality of automated code transformations.
- 2024. Our replication package. https://anonymous.4open.science/r/changeGuard-7669/README.md
- DyPyBench: A Benchmark of Executable Python Software. In ACM International Conference on the Foundations of Software Engineering (FSE).
- APIDiff: Detecting API breaking changes. In 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 507–511.
- DiffSearch: A scalable and precise search engine for code changes. IEEE Transactions on Software Engineering (2022).
- Automated detection of refactorings in evolving components. In ECOOP 2006–Object-Oriented Programming: 20th European Conference, Nantes, France, July 3-7, 2006. Proceedings 20. Springer, 404–428.
- Blanket Execution: Dynamic Similarity Testing for Program Binaries and Components. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, August 20-22, 2014. 303–317.
- Blanket execution: Dynamic similarity testing for program binaries and components. In 23rd USENIX Security Symposium (USENIX Security 14). 303–317.
- Martin Fowler. 2018. Refactoring: improving the design of existing code. Addison-Wesley Professional.
- Scalable detection of semantic clones. In Proceedings of the 30th international conference on Software engineering. 321–330.
- DeepJIT: an end-to-end deep learning framework for just-in-time defect prediction. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR 2019, 26-27 May 2019, Montreal, Canada. 34–45. https://doi.org/10.1109/MSR.2019.00016
- Cc2vec: Distributed representations of code changes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. 518–529.
- Lingxiao Jiang and Zhendong Su. 2009. Automatic mining of functionally equivalent code fragments via random testing. In Proceedings of the Eighteenth International Symposium on Software Testing and Analysis, ISSTA 2009, Chicago, IL, USA, July 19-23, 2009. 81–92.
- A Large-Scale Empirical Study of Just-in-Time Quality Assurance. IEEE Trans. Software Eng. 39, 6 (2013), 757–773. https://doi.org/10.1109/TSE.2012.70
- MeCC: memory comparison-based clone detector. In Proceedings of the 33rd International Conference on Software Engineering. 301–310.
- Ref-finder: a refactoring reconstruction tool based on logic query templates. In Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering. 371–372.
- Raghavan Komondoor and Susan Horwitz. 2001. Using slicing to identify duplication in source code. In International static analysis symposium. Springer, 40–56.
- Jens Krinke. 2001. Identifying similar code with program dependence graphs. In Proceedings eighth working conference on reverse engineering. IEEE, 301–309.
- Codamosa: Escaping coverage plateaus in test generation with pre-trained large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 919–931.
- Automated unit test generation for python. In Search-Based Software Engineering: 12th International Symposium, SSBSE 2020, Bari, Italy, October 7–8, 2020, Proceedings 12. Springer, 9–24.
- Paul Dan Marinescu and Cristian Cadar. 2013. KATCH: high-coverage testing of software patches.. In ESEC/SIGSOFT FSE. 235–245.
- Type regression testing to detect breaking changes in Node. js libraries. In 32nd european conference on object-oriented programming (ECOOP 2018). Schloss-Dagstuhl-Leibniz Zentrum fĂ¼r Informatik.
- Audris Mockus and David M Weiss. 2000. Predicting risk of software changes. Bell Labs Technical Journal 5, 2 (2000), 169–180.
- How we refactor, and how we know it. IEEE Transactions on Software Engineering 38, 1 (2011), 5–18.
- Learning approximate execution semantics from traces for binary function similarity. IEEE Transactions on Software Engineering (2022).
- Juan Altmayer Pizzorno and Emery D Berger. 2024. CoverUp: Coverage-Guided LLM-Based Test Generation. arXiv preprint arXiv:2403.16218 (2024).
- Software clone detection: A systematic review. Information and Software Technology 55, 7 (2013), 1165–1199.
- Chanchal Kumar Roy and James R Cordy. 2007. A survey on software clone detection research. Queen’s School of computing TR 541, 115 (2007), 64–68.
- Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM. arXiv preprint arXiv:2402.00097 (2024).
- Code clones: Detection and management. Procedia computer science 132 (2018), 718–727.
- An industrial study on the risk of software changes. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. 1–11.
- Refdiff 2.0: A multi-language refactoring detection tool. IEEE Transactions on Software Engineering 47, 12 (2020), 2786–2802.
- Danilo Silva and Marco Tulio Valente. 2017. Refdiff: detecting refactorings in version histories. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). IEEE, 269–279.
- Beatriz Souza and Michael Pradel. 2023. LExecutor: Learning-Guided Execution. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE. 1522–1534. https://doi.org/10.1145/3611643.3616254
- Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities. In Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 607–622.
- RefactoringMiner 2.0. IEEE Transactions on Software Engineering 48, 3 (2020), 930–950.
- RefactoringMiner 2.0. IEEE Trans. Software Eng. 48, 3 (2022), 930–950. https://doi.org/10.1109/TSE.2020.3007722
- Accurate and efficient refactoring detection in commit history. In Proceedings of the 40th international conference on software engineering. 483–494.
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In EMNLP. https://doi.org/10.18653/v1/2021.emnlp-main.685
- Deep Learning for Just-in-Time Defect Prediction. In 2015 IEEE International Conference on Software Quality, Reliability and Security, QRS 2015, Vancouver, BC, Canada, August 3-5, 2015. 17–26. https://doi.org/10.1109/QRS.2015.14
- Making python code idiomatic by automatic refactoring non-idiomatic python code with pythonic idioms. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 696–708.
- A Systematic Survey of Just-in-Time Software Defect Prediction. ACM Comput. Surv. 55, 10 (2023), 201:1–201:35. https://doi.org/10.1145/3567550
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.