Personalizing black-box models for nonparametric regression with minimax optimality

Published 4 Jan 2026 in stat.ME and stat.ML | (2601.01432v1)

Abstract: Recent advances in large-scale models, including deep neural networks and LLMs, have substantially improved performance across a wide range of learning tasks. The widespread availability of such pre-trained models creates new opportunities for data-efficient statistical learning, provided they can be effectively integrated into downstream tasks. Motivated by this setting, we study few-shot personalization, where a pre-trained black-box model is adapted to a target domain using a limited number of samples. We develop a theoretical framework for few-shot personalization in nonparametric regression and propose algorithms that can incorporate a black-box pre-trained model into the regression procedure. We establish the minimax optimal rate for the personalization problem and show that the proposed method attains this rate. Our results clarify the statistical benefits of leveraging pre-trained models under sample scarcity and provide robustness guarantees when the pre-trained model is not informative. We illustrate the finite-sample performance of the methods through simulations and an application to the California housing dataset with several pre-trained models.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper's main contribution is a novel few-shot personalization method that integrates pre-trained models into nonparametric regression while achieving minimax optimality.
It employs a θ-local-smoothing technique and cross-validation for bias correction and stability using a limited budget of labeled samples.
Empirical results on synthetic and real datasets demonstrate that the personalized model significantly outperforms conventional estimators in estimation accuracy.

Personalizing Black-Box Models for Nonparametric Regression with Minimax Optimality

Introduction

The paper "Personalizing Black-Box Models for Nonparametric Regression with Minimax Optimality" (2601.01432) addresses the challenge of adapting pre-trained black-box models to specific tasks with limited data. This process, known as few-shot personalization, is crucial in situations where data collection is costly or impractical. By developing a theoretical framework for few-shot personalization in nonparametric regression, the authors propose a method to integrate pre-trained models effectively, achieving minimax optimality in the process.

Methodology

Few-Shot Personalization Framework

The authors introduce a comprehensive framework for few-shot personalization. The method involves several key steps:

Sample Retrieval: A limited budget of labeled samples is strategically selected from the target domain.
Smoothed Bias Correction: The pre-trained model is adjusted using a $\bm{\theta}$ -local-smoothing technique, ensuring the model's stability by regularizing the variance in predictions.
Adaptation: Cross-validation is employed to select the best parameters for the $\bm{\theta}$ -local-smoothing, ensuring the robust integration of pre-trained models.

Theoretical Insights

Underpinning the framework is a robust theoretical analysis. The authors establish minimax optimal rates for the problem, demonstrating that their method is statistically efficient. Theoretical results reveal that leveraging a pre-trained model can effectively reduce the complexity of the estimation problem under sample scarcity. Specifically, the integration of a pre-trained model allows the bias of the regression estimate to align closely with the target distribution, even when sample sizes are minimal.

Results

The research provides empirical validation through simulations and application to real datasets. Evaluations are conducted on synthetic datasets and the California housing dataset, showcasing the method's efficacy. The few-shot personalized model outperforms both conventional single-task nonparametric estimators and the baseline pre-trained models, demonstrating substantial gains in estimation accuracy.

Implications and Future Work

The proposed framework pioneers a path for personalizing large pre-trained models with a theoretical backing, effectively addressing the challenge of adapting models to new contexts with limited data. This approach has significant implications for fields that rely on large models but face data constraints. The methodology can revolutionize practical applications in machine learning where resources are scarce.

Future research directions could explore extending the framework to other statistical settings, such as parametric models or high-dimensional scenarios. Additionally, the exploration of personalization methods that go beyond regression, potentially including classification tasks with minimal user-specific data, represents a promising avenue for further investigations.

Conclusion

This paper provides a rigorous and effective approach to personalizing black-box models for regression tasks, achieving minimax optimality. By integrating theoretical insights with practical algorithms, the research sets a foundation for future exploration in personalization strategies across various statistical frameworks in artificial intelligence.

Markdown Report Issue