2000 character limit reached
Megadiff: A Dataset of 600k Java Source Code Changes Categorized by Diff Size
Published 10 Aug 2021 in cs.SE | (2108.04631v1)
Abstract: This paper presents Megadiff, a dataset of source code diffs. It focuses on Java, with strict inclusion criteria based on commit message and diff size. Megadiff contains 663 029 Java diffs that can be used for research on commit comprehension, fault localization, automated program repair, and machine learning on code changes.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.