Generalization of optimistic/pessimistic bilevel solutions

Determine how solutions produced by optimistic/pessimistic bilevel optimization formulations—where the outer-level objective is optimized jointly over the outer parameter and the inner-level variable subject to the inner-level optimality constraint—perform on unseen data in machine learning tasks. Specifically, assess the out-of-sample behavior and generalization properties of these solutions when applied to new data, in contrast to their performance on training data.

Background

The paper discusses amended bilevel formulations commonly used in mathematical optimization—optimistic and pessimistic variants—in which the outer-level objective is optimized over both the outer and inner variables under the optimality constraint for the inner-level variable. Recent works have proposed tractable methods to solve these formulations.

However, in machine learning contexts, models are trained on finite datasets and then evaluated on unseen data. When the inner-level incorporates over-parameterized models, their parameters may be further optimized for the outer objective, raising concerns about overfitting. The authors explicitly point out that it is unclear how well solutions from optimistic/pessimistic bilevel formulations generalize to unseen data, motivating the need to rigorously characterize their out-of-sample behavior.

References

While tractable methods were recently proposed to solve them, it is unclear how well would the resulting solutions behave on unseen data in the context of machine learning.

Functional Bilevel Optimization for Machine Learning  (2403.20233 - Petrulionyte et al., 2024) in Section 1 (Introduction)