Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects
Abstract: Background: Open Source Software is the building block of modern software. However, the prevalence of project deprecation in the open source world weakens the integrity of the downstream systems and the broad ecosystem. Therefore it calls for efforts in monitoring and predicting project deprecations, empowering stakeholders to take proactive measures. Challenge: Existing techniques mainly focus on static features on a point in time to make predictions, resulting in limited effects. Goal: We propose a novel metric from the user-repository network, and leverage the metric to fit project deprecation predictors and prove its real-life implications. Method: We establish a comprehensive dataset containing 103,354 non-fork GitHub OSS projects spanning from 2011 to 2023. We propose repository centrality, a family of HITS weights that captures shifts in the popularity of a repository in the repository-user star network. Further with the metric, we utilize the advancements in gradient boosting and deep learning to fit survival analysis models to predict project lifespan or its survival hazard. Results: Our study reveals a correlation between the HITS centrality metrics and the repository deprecation risk. A drop in the HITS weights of a repository indicates a decline in its centrality and prevalence, leading to an increase in its deprecation risk and a decrease in its expected lifespan. Our predictive models powered by repository centrality and other repository features achieve satisfactory accuracy on the test set, with repository centrality being the most significant feature among all. Implications: This research offers a novel perspective on understanding the effect of prevalence on the deprecation of OSS repositories. Our approach to predict repository deprecation help detect health status of project and take actions in advance, fostering a more resilient OSS ecosystem.
- (2023) Synopsys: Open source security and risk analysis report. [Online]. Available: https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html
- Github project: atom/atom. [Online]. Available: https://github.com/atom/atom
- Github project: adobe/brackets. [Online]. Available: https://github.com/adobe/brackets
- Github project: Marak/faker.js. [Online]. Available: https://github.com/Marak/faker.js
- R. Robbes, M. Lungu, and D. Röthlisberger, “How do developers react to api deprecation? the case of a smalltalk ecosystem,” in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012, pp. 1–11.
- A. A. Sawant, R. Robbes, and A. Bacchelli, “On the reaction to deprecation of 25,357 clients of 4+ 1 popular java apis,” in 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 2016, pp. 400–410.
- J. Khondhu, A. Capiluppi, and K.-J. Stol, “Is it all lost? a study of inactive open source projects,” in Open Source Software: Quality Verification: 9th IFIP WG 2.13 International Conference, OSS 2013, Koper-Capodistria, Slovenia, June 25-28, 2013. Proceedings 9. Springer, 2013, pp. 61–79.
- I. Samoladas, L. Angelis, and I. Stamelos, “Survival analysis on the duration of open source projects,” Information and Software Technology, vol. 52, no. 9, pp. 902–922, 2010.
- X. Li, S. Moreschini, F. Pecorelli, and D. Taibi, “Ossara: abandonment risk assessment for embedded open source components,” IEEE Software, vol. 39, no. 4, pp. 48–53, 2022.
- M. Valiev, B. Vasilescu, and J. Herbsleb, “Ecosystem-level determinants of sustained activity in open-source projects: A case study of the pypi ecosystem,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 644–655.
- R. G. Kula, A. Ouni, D. M. German, and K. Inoue, “On the impact of micro-packages: An empirical study of the npm javascript ecosystem,” arXiv preprint arXiv:1709.04638, 2017.
- J. Coelho, M. T. Valente, L. L. Silva, and E. Shihab, “Identifying unmaintained projects in github,” in Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2018, pp. 1–10.
- J. Coelho, M. T. Valente, L. Milen, and L. L. Silva, “Is this github project maintained? measuring the level of maintenance activity of open-source projects,” Information and Software Technology, vol. 122, p. 106274, 2020.
- S. Mujahid, D. E. Costa, R. Abdalkareem, E. Shihab, M. A. Saied, and B. Adams, “Toward using package centrality trend to identify packages in decline,” IEEE Transactions on Engineering Management, vol. 69, no. 6, pp. 3618–3632, 2021.
- A. Clauset, M. E. Newman, and C. Moore, “Finding community structure in very large networks,” Physical review E, vol. 70, no. 6, p. 066111, 2004.
- M. Pinzger, N. Nagappan, and B. Murphy, “Can developer-module networks predict failures?” in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, 2008, pp. 2–12.
- N. Nagappan, A. Zeller, T. Zimmermann, K. Herzig, and B. Murphy, “Change bursts as defect predictors,” in 2010 IEEE 21st international symposium on software reliability engineering. IEEE, 2010, pp. 309–318.
- Github. ”archiving repositories - github docs”. [Online]. Available: https://docs.github.com/en/repositories/archiving-a-github-repository/archiving-repositories.
- Github. “archiving repositories”. [Online]. Available: https://github.blog/2017-11-08-archiving-repositories/
- (2015) Github project: Zzprojects. ”add-on feature for entity framework”. [Online]. Available: https://github.com/zzzprojects/EntityFramework.Extended.
- J. Coelho and M. T. Valente, “Why modern open source projects fail,” in Proceedings of the 2017 11th Joint meeting on foundations of software engineering, 2017, pp. 186–196.
- Github graphql api. [Online]. Available: https://docs.github.com/en/graphql
- W. Xiao, H. He, W. Xu, X. Tan, J. Dong, and M. Zhou, “Recommending good first issues in github oss projects,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1830–1842.
- (2024) Open source data labelling platform. [Online]. Available: https://labelstud.io/
- L. Tunstall, N. Reimers, U. E. S. Jo, L. Bates, D. Korat, M. Wasserblat, and O. Pereg, “Efficient few-shot learning without prompts,” CoRR, vol. abs/2209.11055, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2209.11055
- N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, K. Inui, J. Jiang, V. Ng, and X. Wan, Eds. Association for Computational Linguistics, 2019, pp. 3980–3990. [Online]. Available: https://doi.org/10.18653/v1/D19-1410
- (2022) sentence-transformers/paraphrase-mpnet-base-v2. [Online]. Available: https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2
- G. Gousios, “The ghtorent dataset and tool suite,” in 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 2013, pp. 233–236.
- A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of open source software development: Apache and mozilla,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 11, no. 3, pp. 309–346, 2002.
- W. Xiao, H. He, W. Xu, Y. Zhang, and M. Zhou, “How early participation determines long-term sustained activity in github projects?” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2023. New York, NY, USA: Association for Computing Machinery, 2023, p. 29–41. [Online]. Available: https://doi.org/10.1145/3611643.3616349
- K. Blincoe, J. Sheoran, S. Goggins, E. Petakovic, and D. Damian, “Understanding the popular users: Following, affiliation influence and leadership on github,” Information and Software Technology, vol. 70, pp. 30–39, 2016.
- K. Crowston, K. Wei, J. Howison, and A. Wiggins, “Free/libre open-source software development: What we know and what we do not know,” ACM Computing Surveys (CSUR), vol. 44, no. 2, pp. 1–35, 2008.
- E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “The promises and perils of mining github,” in Proceedings of the 11th working conference on mining software repositories, 2014, pp. 92–101.
- J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of the ACM (JACM), vol. 46, no. 5, pp. 604–632, 1999.
- M. R. Prajapati, “A survey paper on hyperlink-induced topic search (hits) algorithms for web mining,” Int J Eng, vol. 1, no. 2, p. 8, 2012.
- M. Goeminne and T. Mens, “Evidence for the pareto principle in open source software activity,” in the Joint Porceedings of the 1st International workshop on Model Driven Software Maintenance and 5th International Workshop on Software Quality and Maintainability. Citeseer, 2011, pp. 74–82.
- Y. Zhang, M. Zhou, A. Mockus, and Z. Jin, “Companies’ participation in OSS development-an empirical study of openstack,” IEEE Trans. Software Eng., vol. 47, no. 10, pp. 2242–2259, 2021. [Online]. Available: https://doi.org/10.1109/TSE.2019.2946156
- A. Ait, J. L. C. Izquierdo, and J. Cabot, “An empirical study on the survival rate of github projects,” in Proceedings of the 19th International Conference on Mining Software Repositories, 2022, pp. 365–375.
- Github project: 0age/homework. [Online]. Available: https://github.com/0age/HomeWork
- Github project: 0mniscient/discord-themes. [Online]. Available: https://github.com/0mniscient/Discord-Themes
- Github project: 00-evan/shattered-pixel-dungeon-gdx. [Online]. Available: https://github.com/00-Evan/shattered-pixel-dungeon-gdx
- M. G. Kendall, “A new measure of rank correlation,” Biometrika, vol. 30, no. 1/2, pp. 81–93, 1938.
- S. Prion and K. A. Haerling, “Making sense of methods and measurement: Spearman-rho ranked-order correlation coefficient,” Clinical Simulation in Nursing, vol. 10, no. 10, pp. 535–536, 2014.
- I. Samoladas, L. Angelis, and I. Stamelos, “Survival analysis on the duration of open source projects,” Information and Software Technology, vol. 52, no. 9, pp. 902–922, 2010. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0950584910000790
- M. Zhou, A. Mockus, X. Ma, L. Zhang, and H. Mei, “Inflow and retention in oss communities with commercial involvement: A case study of three hybrid projects,” ACM Trans. Softw. Eng. Methodol., vol. 25, no. 2, apr 2016. [Online]. Available: https://doi.org/10.1145/2876443
- M. Joblin and S. Apel, “How do successful and failed projects differ? a socio-technical analysis,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 31, no. 4, pp. 1–24, 2022.
- D. R. Cox, “Regression models and life-tables,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 34, no. 2, pp. 187–202, 1972.
- A. Barnwal, H. Cho, and T. Hocking, “Survival regression with accelerated failure time model in xgboost,” J. Comput. Graph. Stat., vol. 31, no. 4, pp. 1292–1302, 2022. [Online]. Available: https://doi.org/10.1080/10618600.2022.2067548
- Y. Li, J. Ju, X. Liu, T. Gao, Z. Wang, Q. Ni, C. Ma, Z. Zhao, Y. Ren, and M. Sun, “Nomograms for predicting long-term overall survival and cancer-specific survival in patients with major salivary gland cancer: a population-based study,” Oncotarget, vol. 8, no. 15, p. 24469, 2017.
- V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci, “Deep neural networks and tabular data: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017.
- K. Ren, J. Qin, L. Zheng, Z. Yang, W. Zhang, L. Qiu, and Y. Yu, “Deep recurrent survival analysis,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 4798–4805.
- Github project: Swifteducation/stopwatch. [Online]. Available: https://github.com/SwiftEducation/Stopwatch
- Github project: google/mr4c. [Online]. Available: https://github.com/google/mr4c
- G. Gousios, M. Pinzger, and A. v. Deursen, “An exploratory study of the pull-based software development model,” in Proceedings of the 36th international conference on software engineering, 2014, pp. 345–355.
- A. S. Badashian, A. Esteki, A. Gholipour, A. Hindle, and E. Stroulia, “Involvement, contribution and influence in github and stack overflow,” 2014.
- A. Capiluppi and M. Michlmayr, “From the cathedral to the bazaar: An empirical study of the lifecycle of volunteer community projects,” in IFIP International Conference on Open Source Systems. Springer, 2007, pp. 31–44.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.