Extend mixture reuse theory beyond parametric regression models

Extend the theoretical analysis of the FullMixtureReuse mechanism for data mixture recomputation—currently developed under log-linear parametric regression models—to non-parametric regression models (e.g., tree-based or Gaussian process regressors) that induce non-convex and non-differentiable mixture optimization objectives. Explicitly formulate conditions and guarantees analogous to those proved under the log-linear assumption, or characterize fundamental obstacles if such guarantees cannot hold.

Background

The paper develops FullMixtureReuse, a mechanism for efficiently recomputing data mixture ratios after domain-set updates, and provides a theoretical analysis under a log-linear regression mixing law. This yields convex surrogate objectives and facilitates clear performance gap bounds under domain updates.

While the authors argue the analysis should carry over to other parametric mixing laws (e.g., AutoScale and BiMix), they note uncertainty regarding non-parametric regression models, which typically produce non-convex and non-differentiable objectives. Establishing analogous guarantees in this broader setting remains unresolved and is crucial for applying mixture reuse theory to methods like tree ensembles or Gaussian processes that have been used in prior mixing literature.

References

Second, our theoretical analysis of mixture reuse assumes log-linear regression models. The analysis should extend naturally to other parametric models like AutoScale and BiMix, but it is less clear how to extend it to non-parametric models that yield non-convex and non-differentiable mixing objectives.

Olmix: A Framework for Data Mixing Throughout LM Development  (2602.12237 - Chen et al., 12 Feb 2026) in Discussion, Limitations (end of paper)