ML-Schema: Exposing the Semantics of Machine Learning with Schemas and Ontologies

Published 14 Jul 2018 in cs.LG, cs.DB, cs.IR, and stat.ML | (1807.05351v1)

Abstract: The ML-Schema, proposed by the W3C Machine Learning Schema Community Group, is a top-level ontology that provides a set of classes, properties, and restrictions for representing and interchanging information on machine learning algorithms, datasets, and experiments. It can be easily extended and specialized and it is also mapped to other more domain-specific ontologies developed in the area of machine learning and data mining. In this paper we overview existing state-of-the-art machine learning interchange formats and present the first release of ML-Schema, a canonical format resulted of more than seven years of experience among different research institutions. We argue that exposing semantics of machine learning algorithms, models, and experiments through a canonical format may pave the way to better interpretability and to realistically achieve the full interoperability of experiments regardless of platform or adopted workflow solution.

Abstract PDF Upgrade to Chat

Authors (8)

Citations (55)

View on Semantic Scholar

Summary

The paper introduces ML-Schema which standardizes machine learning representations by unifying diverse ontologies to promote clear model semantics.
The paper details a mapping of existing ontologies such as OntoDM-core, Exposé, DMOP, and MEX, facilitating transparent ML experiment descriptions.
The paper demonstrates that adopting ML-Schema enhances reproducibility and interoperability across different ML platforms and workflows.

Introduction

Machine learning (ML) models have achieved significant successes in multiple predictive tasks, yet this success is often accompanied by concerns about the transparency and comprehension of these models. One particular challenge in the field is achieving interoperability across different ML platforms and ensuring that experiments can be reproduced regardless of the specific tools used. Existing ML platforms each have their unique ways of representing data and metadata, leading to difficulties in sharing and understanding machine learning algorithms, models, and experiments across different systems.

The ML-Schema

The ML-Schema is introduced as a shared ontology developed by the W3C Machine Learning Schema Community Group. It aims to provide a standardized set of classes, properties, and restrictions for representing and exchanging information on ML algorithms, datasets, and experiments. This schema serves as a backbone to create new specialized classes and properties and is also mapped to more specific ontologies in the ML field. It is designed to facilitate better interpretability of ML models by providing clear and canonical descriptors of the ML process.

Machine Learning Ontologies

The ML-Schema draws from a range of existing ontologies and vocabularies to create a comprehensive framework for ML semantics. This includes the OntoDM-core ontology that provides generic representations of data mining entities; the Exposé ontology which describes ML experiments; the DMOP ontology focused on meta-mining and meta-learning processes; and the MEX vocabulary, which is tailored for exchanging basic ML metadata. The paper details how the ML-Schema provides mappings between these ontologies and vocabularies, enabling better interoperability and shared understanding in the machine learning domain.

Conclusions

ML-Schema positions itself as a solution that strides towards alleviating the fragmentation of representation in the machine learning community. By aligning more detailed ontologies and vocabularies, ML-Schema helps in making the semantics of ML models explicit and interpretable. A shared, unified method of describing ML outcomes and processes enables researchers and practitioners to exchange information more transparently and potentially benefit platforms like ML ecosystems and metadata repositories. The development of such a schema represents a significant step towards full interoperability and reproducibility of ML experiments across various platforms and workflows.

Markdown Report Issue