yProv4ML: Effortless Provenance Tracking for Machine Learning Systems
Abstract: The rapid growth of interest in LLMs reflects their potential for flexibility and generalization, and attracted the attention of a diverse range of researchers. However, the advent of these techniques has also brought to light the lack of transparency and rigor with which development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework to capture provenance information generated during machine learning processes in PROV-JSON format, with minimal code modifications.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.