Papers
Topics
Authors
Recent
Search
2000 character limit reached

Reproducibility of Build Environments through Space and Time

Published 1 Feb 2024 in cs.SE | (2402.00424v1)

Abstract: Modern software engineering builds up on the composability of software components, that rely on more and more direct and transitive dependencies to build their functionalities. This principle of reusability however makes it harder to reproduce projects' build environments, even though reproducibility of build environments is essential for collaboration, maintenance and component lifetime. In this work, we argue that functional package managers provide the tooling to make build environments reproducible in space and time, and we produce a preliminary evaluation to justify this claim. Using historical data, we show that we are able to reproduce build environments of about 7 million Nix packages, and to rebuild 99.94% of the 14 thousand packages from a 6-year-old Nixpkgs revision.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Mining Component Repositories for Installability Issues. In 12th IEEE/ACM Working Conference on Mining Software Repositories, MSR 2015, Florence, Italy, May 16-17, 2015, Massimiliano Di Penta, Martin Pinzger, and Romain Robbes (Eds.). IEEE Computer Society, 24–33. https://doi.org/10.1109/MSR.2015.10
  2. Carl Boettiger. 2015. An Introduction to Docker for Reproducible Research. SIGOPS Oper. Syst. Rev. 49, 1 (jan 2015), 71–79. https://doi.org/10.1145/2723872.2723882
  3. Jürgen Cito and Harald C. Gall. 2016. Using Docker Containers to Improve Reproducibility in Software Engineering Research. In Proceedings of the 38th International Conference on Software Engineering Companion (Austin, Texas) (ICSE ’16). Association for Computing Machinery, New York, NY, USA, 906–907. https://doi.org/10.1145/2889160.2891057
  4. Christian Couder. [n. d.]. Fully automated bisecting with ”git bisect run” [LWN.net]. https://lwn.net/Articles/317154/
  5. Ludovic Courtès. 2013. Functional Package Management with Guix. In Proceedings of ELS 2013 - 6th European Lisp Symposium, Madrid, Spain, June 3-4, 2013, Christian Queinnec and Manuel Serrano (Eds.). ELSAA, 4–14. https://european-lisp-symposium.org/static/proceedings/2013.pdf#page=10
  6. Ludovic Courtès and Ricardo Wurmus. 2015. Reproducible and User-Controlled Software Environments in HPC with Guix. In Euro-Par 2015: Parallel Processing Workshops - Euro-Par 2015 International Workshops, Vienna, Austria, August 24-25, 2015, Revised Selected Papers (Lecture Notes in Computer Science, Vol. 9523), Sascha Hunold, Alexandru Costan, Domingo Giménez, Alexandru Iosup, Laura Ricci, María Engracia Gómez Requena, Vittorio Scarano, Ana Lucia Varbanescu, Stephen L. Scott, Stefan Lankes, Josef Weidendorfer, and Michael Alexander (Eds.). Springer, 579–591. https://doi.org/10.1007/978-3-319-27308-2_47
  7. Theoretical Analysis of git bisect. In LATIN 2022: Theoretical Informatics (Lecture Notes in Computer Science), Armando Castañeda and Francisco Rodríguez-Henríquez (Eds.). Springer International Publishing, Cham, 157–171. https://doi.org/10.1007/978-3-031-20624-5_10
  8. Roberto Di Cosmo and Stefano Zacchiroli. 2017. Software Heritage: Why and How to Preserve Software Source Code. In iPRES 2017 - 14th International Conference on Digital Preservation. Kyoto, Japan, 1–10. https://hal.science/hal-01590958
  9. Stephanie Dick and Daniel Volmar. 2018. DLL Hell: Software Dependencies, Failure, and the Maintenance of Microsoft Windows. IEEE Annals of the History of Computing 40, 4 (Oct. 2018), 28–51. https://doi.org/10.1109/MAHC.2018.2877913 Conference Name: IEEE Annals of the History of Computing.
  10. Eelco Dolstra. 2006. The purely functional software deployment model. Ph. D. Dissertation. s.n., S.l. OCLC: 71702886.
  11. NixOS: A purely functional Linux distribution. J. Funct. Program. 20, 5-6 (2010), 577–615. https://doi.org/10.1017/S0956796810000195
  12. Continuous Integration: Improving Software Quality and Reducing Risk. Pearson Education. Google-Books-ID: PV9qfEdv9L0C.
  13. Investigating The Reproducibility of NPM Packages. In IEEE International Conference on Software Maintenance and Evolution, ICSME 2020, Adelaide, Australia, September 28 - October 2, 2020. IEEE, 677–681. https://doi.org/10.1109/ICSME46990.2020.00071
  14. nix-eval-jobs. https://github.com/nix-community/nix-eval-jobs
  15. Structure and Evolution of Package Dependency Networks. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 102–112. https://doi.org/10.1109/MSR.2017.55
  16. Chris Lamb and Stefano Zacchiroli. 2022. Reproducible Builds: Increasing the Integrity of Software Supply Chains. IEEE Softw. 39, 2 (2022), 62–70. https://doi.org/10.1109/MS.2021.3073045
  17. An empirical analysis of flaky tests. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA, 643–653. https://doi.org/10.1145/2635868.2635920
  18. Julien Malka. 2023a. fix recurseForDerivations evaluation in force-recurse mode by JulienMalka · Pull Request #206 · nix-community/nix-eval-jobs. https://github.com/nix-community/nix-eval-jobs/pull/206
  19. Julien Malka. 2023b. hydra-eval-jobs: fix jobs containing a dot being dropped by JulienMalka · Pull Request #1286 · NixOS/hydra. https://github.com/NixOS/hydra/pull/1286
  20. Julien Malka. 2024. Replication package for: Reproducibility of Build Environments through Space and Time. https://doi.org/10.5281/zenodo.10519820
  21. Dmitry Marakasov. 2016–2023. Repology, the packaging hub. https://repology.org/
  22. Towards a Methodology for Software Preservation. In Proceedings of the 6th International Conference on Digital Preservation, iPRES 2009, San Francisco, CA, USA, October 5-6, 2009. https://hdl.handle.net/11353/10.294040
  23. Mathias Meyer. 2014. Continuous Integration and Its Tools. IEEE Software 31, 3 (May 2014), 14–16. https://doi.org/10.1109/MS.2014.58 Conference Name: IEEE Software.
  24. Fixing dependency errors for Python build reproducibility. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2021). Association for Computing Machinery, New York, NY, USA, 439–451. https://doi.org/10.1145/3460319.3464797
  25. Ten simple rules for writing Dockerfiles for reproducible data science. PLOS Computational Biology 16, 11 (Nov. 2020), e1008316. https://doi.org/10.1371/journal.pcbi.1008316 Publisher: Public Library of Science.
  26. A Survey of Flaky Tests. ACM Trans. Softw. Eng. Methodol. 31, 1 (Oct. 2021), 17:1–17:74. https://doi.org/10.1145/3476105
  27. Jeffrey M. Perkel. 2020. Challenge to scientists: does your ten-year-old code still run? Nature 584, 7822 (Aug. 2020), 656–658. https://doi.org/10.1038/d41586-020-02462-7 Bandiera_abtest: a Cg_type: Technology Feature Number: 7822 Publisher: Nature Publishing Group Subject_term: Computational biology and bioinformatics, Computer science, Research data, Software.
  28. Nix fixes dependency hell on all Linux distributions. https://web.archive.org/web/20150708101023/http://archive09.linux.com/feature/155922
  29. An Empirical Analysis of Build Failures in the Continuous Integration Workflows of Java-Based Open-Source Software. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 345–355. https://doi.org/10.1109/MSR.2017.54
  30. ReScience. 2020. Ten Years Reproducibility Challenge. http://rescience.github.io/ten-years/
  31. Mahadev Satyanarayanan. 2018. Saving software from oblivion. IEEE Spectrum 55, 10 (Oct. 2018), 36–41. https://doi.org/10.1109/MSPEC.2018.8482422 Conference Name: IEEE Spectrum.
  32. Programmers’ build errors: a case study (at google). In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 724–734. https://doi.org/10.1145/2568225.2568255
  33. Len Shustek. 2006. What Should We Collect to Preserve the History of Software? IEEE Annals of the History of Computing 28, 4 (Oct. 2006), 112–111. https://doi.org/10.1109/MAHC.2006.78 Conference Name: IEEE Annals of the History of Computing.
  34. A comprehensive study of bloated dependencies in the Maven ecosystem. Empir Software Eng 26, 3 (March 2021), 45. https://doi.org/10.1007/s10664-020-09914-8
  35. Scalable Workflows and Reproducible Data Analysis for Genomics. Springer New York, New York, NY, 723–745. https://doi.org/10.1007/978-1-4939-9074-0_24
  36. An empirical characterization of bad practices in continuous integration. Empir Software Eng 25, 2 (March 2020), 1095–1135. https://doi.org/10.1007/s10664-019-09785-8
  37. A large-scale empirical study of compiler errors in continuous integration. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA, 176–187. https://doi.org/10.1145/3338906.3338917
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 4 likes about this paper.