When Neural Code Completion Models Size up the Situation: Attaining Cheaper and Faster Completion through Dynamic Model Inference
Abstract: Leveraging recent advancements in LLMs, modern neural code completion models have demonstrated the capability to generate highly accurate code suggestions. However, their massive size poses challenges in terms of computational costs and environmental impact, hindering their widespread adoption in practical scenarios. Dynamic inference emerges as a promising solution, as it allocates minimal computation during inference while maintaining the model's performance. In this research, we explore dynamic inference within the context of code completion. Initially, we conducted an empirical investigation on GPT-2, focusing on the inference capabilities of intermediate layers for code completion. We found that 54.4% of tokens can be accurately generated using just the first layer, signifying significant computational savings potential. Moreover, despite using all layers, the model still fails to predict 14.5% of tokens correctly, and the subsequent completions continued from them are rarely considered helpful, with only a 4.2% Acceptance Rate. These findings motivate our exploration of dynamic inference in code completion and inspire us to enhance it with a decision-making mechanism that stops the generation of incorrect code. We thus propose a novel dynamic inference method specifically tailored for code completion models. This method aims not only to produce correct predictions with largely reduced computation but also to prevent incorrect predictions proactively. Our extensive evaluation shows that it can averagely skip 1.7 layers out of 16 layers in the models, leading to an 11.2% speedup with only a marginal 1.1% reduction in ROUGE-L.
- 2022. Code faster with AI completions — TabNine. Retrieved Nov 25, 2022 from https://www.tabnine.com/
- 2022. GitHub Copilot · Your AI pair programmer. Retrieved Nov 25, 2022 from https://copilot.github.com/
- 2022. ML-powered coding companion – Amazon CodeWhisperer – Amazon Web Services. Retrieved Nov 25, 2022 from https://aws.amazon.com/codewhisperer/
- 2023. Aaron Mok. Retrieved July 31, 2023 from https://www.businessinsider.com/how-much-chatgpt-costs-openai-to-run-estimate-report-2023-4
- 2023. Cursor - The AI-first Code Editor. Retrieved Jul 25, 2023 from https://www.cursor.so/
- 2023. SEC. Retrieved January 31, 2023 from https://sites.google.com/view/stop-exit-controller
- Evaluating Large Language Models Trained on Code. ArXiv abs/2107.03374 (2021).
- Reducing the Carbon Impact of Generative AI Inference (today and in 2035). ACM Hot Carbon 2023 (2023).
- William G Cochran. 1977. Sampling techniques. Wiley Eastern Limited.
- EPNet: Learning to Exit with Flexible Multi-Branch Network. conference on information and knowledge management (2020).
- Out of the bleu: how should we assess quality of the code generation models? Journal of Systems and Software 203 (2023), 111741.
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages. ArXiv abs/2002.08155 (2020). https://api.semanticscholar.org/CorpusID:211171605
- DynaBERT: Dynamic BERT with Adaptive Width and Depth. ArXiv abs/2004.04037 (2020).
- Large language models for software engineering: A systematic literature review. arXiv preprint arXiv:2308.10620 (2023).
- CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. ArXiv abs/1909.09436 (2019).
- The cascading neural network: building the Internet of Smart Things. Knowledge and Information Systems (2017).
- CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade. In Conference on Empirical Methods in Natural Language Processing.
- StarCoder: may the source be with you! (2023). arXiv:2305.06161Â [cs.CL]
- Competition-level code generation with AlphaCode. Science 378 (2022), 1092 – 1097.
- Finding decision jumps in text classification. Neurocomputing (2020).
- When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming. ArXiv abs/2306.04930 (2023). https://api.semanticscholar.org/CorpusID:259108906
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis.
- Language Models are Unsupervised Multitask Learners.
- Are Emergent Abilities of Large Language Models a Mirage? ArXiv abs/2304.15004 (2023). https://api.semanticscholar.org/CorpusID:258418299
- Confident Adaptive Language Modeling. ArXiv abs/2207.07061 (2022).
- PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback. arXiv:2307.14936Â [cs.CL]
- Towards Smaller, Faster, and Greener Language Models of Code. arXiv e-prints (2023), arXiv–2309.
- Compressing Pre-trained Models of Code into 3 MB. Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (2022). https://api.semanticscholar.org/CorpusID:251564126
- Don’t Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems. arXiv:2209.05948 [cs.SE]
- On the Importance of Building High-quality Training Datasets for Neural Code Search. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) (2022), 1609–1620.
- BranchyNet: Fast inference via early exiting from deep neural networks. International Conference on Pattern Recognition (2016).
- Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. CHI Conference on Human Factors in Computing Systems Extended Abstracts (2022).
- Attention is All you Need. In Proceedings of the Annual Conference on Neural Information Processing Systems. 5998–6008.
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. ArXiv abs/2109.00859 (2021). https://api.semanticscholar.org/CorpusID:237386541
- DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference. In Annual Meeting of the Association for Computational Linguistics.
- BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression. In Conference of the European Chapter of the Association for Computational Linguistics.
- CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X. ArXiv abs/2303.17568 (2023). https://api.semanticscholar.org/CorpusID:257834177
- BERT Loses Patience: Fast and Robust Inference with Early Exit. In Proceedings of the Annual Conference on Neural Information Processing Systems 2020.
- Productivity assessment of neural code completion. Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (2022).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.