Large Language Models for Unit Test Generation: Achievements, Challenges, and Opportunities
Abstract: Automated unit test generation is critical for software quality but traditional structure-driven methods often lack the semantic understanding required to produce realistic inputs and oracles. LLMs address this limitation by leveraging their extensive data-driven knowledge of code semantics and programming patterns. To analyze the state of the art in this domain, we conducted a systematic literature review of 115 publications published between May 2021 and August 2025. We propose a taxonomy based on the unit test generation lifecycle that divides the process into a generative phase for creating test artifacts and a quality assurance phase for refining them. Our analysis reveals that prompt engineering has emerged as the dominant utilization approach and accounts for 89% of the studies due to its flexibility. We find that iterative validation and repair loops have become the standard mechanism to ensure robust usability by significantly improving compilation and execution pass rates. However, critical challenges remain regarding the weak fault detection capabilities and the lack of standardized benchmarks. We conclude with a roadmap for future research that emphasizes the progression toward autonomous testing agents and hybrid systems combining LLMs with traditional software engineering tools.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.