Ecosystem-wide influences on pull request decisions: insights from NPM
Abstract: The pull-based development model facilitates global collaboration within open-source software projects. However, whereas it is increasingly common for software to depend on other projects in their ecosystem, most research on the pull request decision-making process explored factors within projects, not the broader software ecosystem they comprise. We uncover ecosystem-wide factors that influence pull request acceptance decisions. We collected a dataset of approximately 1.8 million pull requests and 2.1 million issues from 20,052 GitHub projects within the NPM ecosystem. Of these, 98% depend on another project in the dataset, enabling studying collaboration across dependent projects. We employed social network analysis to create a collaboration network in the ecosystem, and mixed effects logistic regression and random forest techniques to measure the impact and predictive strength of the tested features. We find that gaining experience within the software ecosystem through active participation in issue-tracking systems, submitting pull requests, and collaborating with pull request integrators and experienced developers benefits all open-source contributors, especially project newcomers. These results are complemented with an exploratory qualitative analysis of 538 pull requests. We find that developers with ecosystem experience make different contributions than users without. Zooming in on a subset of 111 pull requests with clear ecosystem involvement, we find 3 overarching and 10 specific reasons why developers involve ecosystem projects in their pull requests. The results show that combining ecosystem-wide factors with features studied in previous work to predict the outcome of pull requests reached an overall F1 score of 0.92. However, the outcomes of pull requests submitted by newcomers are harder to predict.
- Breiman L (2001) Random forests. Machine Learning 45(1):5–32, DOI https://doi.org/10.1023/A:1010933404324
- Chen HM, Kazman R, Catolino G, Manca M, Tamburri DA, Van Den Heuvel WJ (2024) An empirical study of social debt in open-source projects: Social drivers and the “known devil” community smell. In: Proceedings of the 57th Hawaii International Conference on System Sciences, URL https://hdl.handle.net/10125/107255
- Cook RD (2000) Detection of influential observation in linear regression. Technometrics 42(1):65–68, DOI https://doi.org/10.1080/00401706.2000.10485981
- Fershtman C, Gandal N (2011) Direct and indirect knowledge spillovers: The “social network” of open-source projects. RAND J Econ 42(1):70–91, DOI https://doi.org/10.1111/j.1756-2171.2010.00126.x
- Katz J (2020) Libraries.io open source repository and dependency metadata. DOI https://doi.org/10.5281/zenodo.3626071
- Rastogi A, Gousios G (2021) How does software change? DOI https://doi.org/10.48550/arXiv.2106.01885
- Wang J (2012) Survival factors for free open source software projects: A multi-stage perspective. Eur Manag J 30(4):352–371, DOI https://doi.org/10.1016/j.emj.2012.03.001
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.