Papers
Topics
Authors
Recent
Search
2000 character limit reached

Revealing the value of Repository Centrality in lifespan prediction of Open Source Software Projects

Published 13 May 2024 in cs.SE | (2405.07508v1)

Abstract: Background: Open Source Software is the building block of modern software. However, the prevalence of project deprecation in the open source world weakens the integrity of the downstream systems and the broad ecosystem. Therefore it calls for efforts in monitoring and predicting project deprecations, empowering stakeholders to take proactive measures. Challenge: Existing techniques mainly focus on static features on a point in time to make predictions, resulting in limited effects. Goal: We propose a novel metric from the user-repository network, and leverage the metric to fit project deprecation predictors and prove its real-life implications. Method: We establish a comprehensive dataset containing 103,354 non-fork GitHub OSS projects spanning from 2011 to 2023. We propose repository centrality, a family of HITS weights that captures shifts in the popularity of a repository in the repository-user star network. Further with the metric, we utilize the advancements in gradient boosting and deep learning to fit survival analysis models to predict project lifespan or its survival hazard. Results: Our study reveals a correlation between the HITS centrality metrics and the repository deprecation risk. A drop in the HITS weights of a repository indicates a decline in its centrality and prevalence, leading to an increase in its deprecation risk and a decrease in its expected lifespan. Our predictive models powered by repository centrality and other repository features achieve satisfactory accuracy on the test set, with repository centrality being the most significant feature among all. Implications: This research offers a novel perspective on understanding the effect of prevalence on the deprecation of OSS repositories. Our approach to predict repository deprecation help detect health status of project and take actions in advance, fostering a more resilient OSS ecosystem.

Authors (3)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. (2023) Synopsys: Open source security and risk analysis report. [Online]. Available: https://www.synopsys.com/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html
  2. Github project: atom/atom. [Online]. Available: https://github.com/atom/atom
  3. Github project: adobe/brackets. [Online]. Available: https://github.com/adobe/brackets
  4. Github project: Marak/faker.js. [Online]. Available: https://github.com/Marak/faker.js
  5. R. Robbes, M. Lungu, and D. Röthlisberger, “How do developers react to api deprecation? the case of a smalltalk ecosystem,” in Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, 2012, pp. 1–11.
  6. A. A. Sawant, R. Robbes, and A. Bacchelli, “On the reaction to deprecation of 25,357 clients of 4+ 1 popular java apis,” in 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).   IEEE, 2016, pp. 400–410.
  7. J. Khondhu, A. Capiluppi, and K.-J. Stol, “Is it all lost? a study of inactive open source projects,” in Open Source Software: Quality Verification: 9th IFIP WG 2.13 International Conference, OSS 2013, Koper-Capodistria, Slovenia, June 25-28, 2013. Proceedings 9.   Springer, 2013, pp. 61–79.
  8. I. Samoladas, L. Angelis, and I. Stamelos, “Survival analysis on the duration of open source projects,” Information and Software Technology, vol. 52, no. 9, pp. 902–922, 2010.
  9. X. Li, S. Moreschini, F. Pecorelli, and D. Taibi, “Ossara: abandonment risk assessment for embedded open source components,” IEEE Software, vol. 39, no. 4, pp. 48–53, 2022.
  10. M. Valiev, B. Vasilescu, and J. Herbsleb, “Ecosystem-level determinants of sustained activity in open-source projects: A case study of the pypi ecosystem,” in Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018, pp. 644–655.
  11. R. G. Kula, A. Ouni, D. M. German, and K. Inoue, “On the impact of micro-packages: An empirical study of the npm javascript ecosystem,” arXiv preprint arXiv:1709.04638, 2017.
  12. J. Coelho, M. T. Valente, L. L. Silva, and E. Shihab, “Identifying unmaintained projects in github,” in Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, 2018, pp. 1–10.
  13. J. Coelho, M. T. Valente, L. Milen, and L. L. Silva, “Is this github project maintained? measuring the level of maintenance activity of open-source projects,” Information and Software Technology, vol. 122, p. 106274, 2020.
  14. S. Mujahid, D. E. Costa, R. Abdalkareem, E. Shihab, M. A. Saied, and B. Adams, “Toward using package centrality trend to identify packages in decline,” IEEE Transactions on Engineering Management, vol. 69, no. 6, pp. 3618–3632, 2021.
  15. A. Clauset, M. E. Newman, and C. Moore, “Finding community structure in very large networks,” Physical review E, vol. 70, no. 6, p. 066111, 2004.
  16. M. Pinzger, N. Nagappan, and B. Murphy, “Can developer-module networks predict failures?” in Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, 2008, pp. 2–12.
  17. N. Nagappan, A. Zeller, T. Zimmermann, K. Herzig, and B. Murphy, “Change bursts as defect predictors,” in 2010 IEEE 21st international symposium on software reliability engineering.   IEEE, 2010, pp. 309–318.
  18. Github. ”archiving repositories - github docs”. [Online]. Available: https://docs.github.com/en/repositories/archiving-a-github-repository/archiving-repositories.
  19. Github. “archiving repositories”. [Online]. Available: https://github.blog/2017-11-08-archiving-repositories/
  20. (2015) Github project: Zzprojects. ”add-on feature for entity framework”. [Online]. Available: https://github.com/zzzprojects/EntityFramework.Extended.
  21. J. Coelho and M. T. Valente, “Why modern open source projects fail,” in Proceedings of the 2017 11th Joint meeting on foundations of software engineering, 2017, pp. 186–196.
  22. Github graphql api. [Online]. Available: https://docs.github.com/en/graphql
  23. W. Xiao, H. He, W. Xu, X. Tan, J. Dong, and M. Zhou, “Recommending good first issues in github oss projects,” in Proceedings of the 44th International Conference on Software Engineering, 2022, pp. 1830–1842.
  24. (2024) Open source data labelling platform. [Online]. Available: https://labelstud.io/
  25. L. Tunstall, N. Reimers, U. E. S. Jo, L. Bates, D. Korat, M. Wasserblat, and O. Pereg, “Efficient few-shot learning without prompts,” CoRR, vol. abs/2209.11055, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2209.11055
  26. N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, K. Inui, J. Jiang, V. Ng, and X. Wan, Eds.   Association for Computational Linguistics, 2019, pp. 3980–3990. [Online]. Available: https://doi.org/10.18653/v1/D19-1410
  27. (2022) sentence-transformers/paraphrase-mpnet-base-v2. [Online]. Available: https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2
  28. G. Gousios, “The ghtorent dataset and tool suite,” in 2013 10th Working Conference on Mining Software Repositories (MSR).   IEEE, 2013, pp. 233–236.
  29. A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of open source software development: Apache and mozilla,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 11, no. 3, pp. 309–346, 2002.
  30. W. Xiao, H. He, W. Xu, Y. Zhang, and M. Zhou, “How early participation determines long-term sustained activity in github projects?” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ser. ESEC/FSE 2023.   New York, NY, USA: Association for Computing Machinery, 2023, p. 29–41. [Online]. Available: https://doi.org/10.1145/3611643.3616349
  31. K. Blincoe, J. Sheoran, S. Goggins, E. Petakovic, and D. Damian, “Understanding the popular users: Following, affiliation influence and leadership on github,” Information and Software Technology, vol. 70, pp. 30–39, 2016.
  32. K. Crowston, K. Wei, J. Howison, and A. Wiggins, “Free/libre open-source software development: What we know and what we do not know,” ACM Computing Surveys (CSUR), vol. 44, no. 2, pp. 1–35, 2008.
  33. E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian, “The promises and perils of mining github,” in Proceedings of the 11th working conference on mining software repositories, 2014, pp. 92–101.
  34. J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” Journal of the ACM (JACM), vol. 46, no. 5, pp. 604–632, 1999.
  35. M. R. Prajapati, “A survey paper on hyperlink-induced topic search (hits) algorithms for web mining,” Int J Eng, vol. 1, no. 2, p. 8, 2012.
  36. M. Goeminne and T. Mens, “Evidence for the pareto principle in open source software activity,” in the Joint Porceedings of the 1st International workshop on Model Driven Software Maintenance and 5th International Workshop on Software Quality and Maintainability.   Citeseer, 2011, pp. 74–82.
  37. Y. Zhang, M. Zhou, A. Mockus, and Z. Jin, “Companies’ participation in OSS development-an empirical study of openstack,” IEEE Trans. Software Eng., vol. 47, no. 10, pp. 2242–2259, 2021. [Online]. Available: https://doi.org/10.1109/TSE.2019.2946156
  38. A. Ait, J. L. C. Izquierdo, and J. Cabot, “An empirical study on the survival rate of github projects,” in Proceedings of the 19th International Conference on Mining Software Repositories, 2022, pp. 365–375.
  39. Github project: 0age/homework. [Online]. Available: https://github.com/0age/HomeWork
  40. Github project: 0mniscient/discord-themes. [Online]. Available: https://github.com/0mniscient/Discord-Themes
  41. Github project: 00-evan/shattered-pixel-dungeon-gdx. [Online]. Available: https://github.com/00-Evan/shattered-pixel-dungeon-gdx
  42. M. G. Kendall, “A new measure of rank correlation,” Biometrika, vol. 30, no. 1/2, pp. 81–93, 1938.
  43. S. Prion and K. A. Haerling, “Making sense of methods and measurement: Spearman-rho ranked-order correlation coefficient,” Clinical Simulation in Nursing, vol. 10, no. 10, pp. 535–536, 2014.
  44. I. Samoladas, L. Angelis, and I. Stamelos, “Survival analysis on the duration of open source projects,” Information and Software Technology, vol. 52, no. 9, pp. 902–922, 2010. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0950584910000790
  45. M. Zhou, A. Mockus, X. Ma, L. Zhang, and H. Mei, “Inflow and retention in oss communities with commercial involvement: A case study of three hybrid projects,” ACM Trans. Softw. Eng. Methodol., vol. 25, no. 2, apr 2016. [Online]. Available: https://doi.org/10.1145/2876443
  46. M. Joblin and S. Apel, “How do successful and failed projects differ? a socio-technical analysis,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 31, no. 4, pp. 1–24, 2022.
  47. D. R. Cox, “Regression models and life-tables,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 34, no. 2, pp. 187–202, 1972.
  48. A. Barnwal, H. Cho, and T. Hocking, “Survival regression with accelerated failure time model in xgboost,” J. Comput. Graph. Stat., vol. 31, no. 4, pp. 1292–1302, 2022. [Online]. Available: https://doi.org/10.1080/10618600.2022.2067548
  49. Y. Li, J. Ju, X. Liu, T. Gao, Z. Wang, Q. Ni, C. Ma, Z. Zhao, Y. Ren, and M. Sun, “Nomograms for predicting long-term overall survival and cancer-specific survival in patients with major salivary gland cancer: a population-based study,” Oncotarget, vol. 8, no. 15, p. 24469, 2017.
  50. V. Borisov, T. Leemann, K. Seßler, J. Haug, M. Pawelczyk, and G. Kasneci, “Deep neural networks and tabular data: A survey,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  51. S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017.
  52. K. Ren, J. Qin, L. Zheng, Z. Yang, W. Zhang, L. Qiu, and Y. Yu, “Deep recurrent survival analysis,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 4798–4805.
  53. Github project: Swifteducation/stopwatch. [Online]. Available: https://github.com/SwiftEducation/Stopwatch
  54. Github project: google/mr4c. [Online]. Available: https://github.com/google/mr4c
  55. G. Gousios, M. Pinzger, and A. v. Deursen, “An exploratory study of the pull-based software development model,” in Proceedings of the 36th international conference on software engineering, 2014, pp. 345–355.
  56. A. S. Badashian, A. Esteki, A. Gholipour, A. Hindle, and E. Stroulia, “Involvement, contribution and influence in github and stack overflow,” 2014.
  57. A. Capiluppi and M. Michlmayr, “From the cathedral to the bazaar: An empirical study of the lifecycle of volunteer community projects,” in IFIP International Conference on Open Source Systems.   Springer, 2007, pp. 31–44.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.