MAPIE: an open-source library for distribution-free uncertainty quantification

Published 25 Jul 2022 in stat.ML and cs.LG | (2207.12274v1)

Abstract: Estimating uncertainties associated with the predictions of Machine Learning (ML) models is of crucial importance to assess their robustness and predictive power. In this submission, we introduce MAPIE (Model Agnostic Prediction Interval Estimator), an open-source Python library that quantifies the uncertainties of ML models for single-output regression and multi-class classification tasks. MAPIE implements conformal prediction methods, allowing the user to easily compute uncertainties with strong theoretical guarantees on the marginal coverages and with mild assumptions on the model or on the underlying data distribution. MAPIE is hosted on scikit-learn-contrib and is fully "scikit-learn-compatible". As such, it accepts any type of regressor or classifier coming with a scikit-learn API. The library is available at: https://github.com/scikit-learn-contrib/MAPIE/.

Abstract PDF Upgrade to Chat

Citations (13)

View on Semantic Scholar

Summary

The paper introduces MAPIE, an open-source library that quantifies uncertainty in ML predictions using conformal methods.
It details robust techniques like Jackknife+, CV+, and specialized methods for time series and classification to ensure reliable model coverage.
MAPIE's seamless integration with scikit-learn enhances usability and offers cross-domain applicability for regression and classification tasks.

Overview of MAPIE: An Open-Source Library for Distribution-Free Uncertainty Quantification

The research paper introduces MAPIE (Model Agnostic Prediction Interval Estimator), a valuable open-source library designed to provide uncertainty quantification (UQ) for ML models. This library facilitates the estimation of uncertainties in both single-output regression and multi-class classification tasks through conformal prediction methods. As an integration within the scikit-learn-contrib framework, MAPIE ensures compatibility with any regressor or classifier following the scikit-learn API.

Importance and Need for Uncertainty Quantification

Quantifying uncertainty in ML predictions is imperative for assessing model robustness and predictive capabilities. The paper underscores four key stakeholders who benefit from UQ:

Model Designers: Gain insights into model prediction reliability on new data.
Business Stakeholders: Optimize decision-making based on ML predictions.
Regulators: Assess regulatory compliance of AI systems.
Impacted Individuals: Enhance transparency and trustworthiness of AI systems.

Key Requirements for UQ Libraries

The paper outlines three essential pillars for UQ libraries:

Model and Use-Case Agnosticism: Applicable across various domains using advanced ML models.
Theoretical Guarantees: Strong assurances on marginal (and possibly conditional) coverage with minimal assumptions.
Open-Source Nature: Adherence to state-of-the-art programming standards to promote trust.

Methods Implemented in MAPIE

General Framework

MAPIE leverages conformal prediction to construct prediction intervals or sets that meet the desired coverage probability. The process includes choosing conformity scores, training models on a training set, calibrating scores on a separate calibration set, and constructing intervals based on the chosen quantile.

Conformal Prediction Methods for Regression

The paper elaborates on three advanced methods for tabular regression:

Jackknife+: Builds upon the Jackknife method and avoids overfitting using leave-one-out models, providing coverage under specific assumptions.
CV+: Employs a cross-validation approach to reduce computational costs while maintaining coverage levels analogous to Jackknife+.
Jackknife+-after-Bootstrap: Uses bootstrapping to create variability in predictions, ensuring coverage without overfitting.

Time Series and Classification

For time series, MAPIE incorporates the EnbPI method, which adapts to non-exchangeable data through dynamic conformity scores. For classification tasks, MAPIE offers LABEL, APS, and Top-K methods, each differing in how they calculate conformity scores and construct prediction sets.

Practical Application

MAPIE provides user-friendly Pythonic classes, MapieRegressor and MapieClassifier, for intuitive integration with existing scikit-learn workflows. The paper includes illustrative examples for time series forecasting and image classification, demonstrating MAPIE's efficacy in real-world scenarios.

Future Directions and Methodological Expansion

The authors outline several avenues for future work, including:

Implementation Updates: Incorporating methods like conformalized quantile regression and addressing covariate shifts.
Framework Optimization: Enhancing base classes for streamlined integration of new methods by external contributors.
New Application Areas: Extending MAPIE to complex settings such as image segmentation and object detection.

Conclusion

This paper presents a comprehensive toolkit for ML practitioners seeking to enhance predictive reliability through uncertainty quantification. MAPIE's robust implementation of conformal prediction methods, combined with its integration in the scikit-learn ecosystem, positions it as an important tool for researchers. The future developments outlined promise broader applications and further improvements in methodological precision.