Improving the number of pretraining batches required for regret guarantees

Ascertain whether the sufficient condition on the number of pretraining batches M in the permutation‑invariant ERM setup—namely M ≥ exp(C((log^2 n)/(log log n) + B_n)) for universal prior‑on‑priors with rate B_n—can be improved while retaining the same regret bound, potentially by imposing additional structural assumptions on the estimator \widehat{\theta}^n.

Background

The paper relaxes the infinite‑batch assumption by proving that the same regret bound holds for a permutation‑invariant ERM estimator when the number of batches M is at least exp(C((log2 n)/(log log n) + B_n)). This scaling is super‑polynomial in n.

The authors note this agrees with practical intuition about large data requirements but explicitly point out that tightening the dependence on M might be possible if one leverages additional structural assumptions on the estimator, leaving this as a direction for future study.

References

This is superpolynomial in $n$, but it agrees with the practical intuition that pretraining typically requires a huge amount of training data. We leave it to future study if a better condition on $M$ can be obtained by assuming different structures of $\widehat{\theta}n$.

Universal priors: solving empirical Bayes via Bayesian inference and pretraining  (2602.15136 - Cannella et al., 16 Feb 2026) in Section 4.1 (Finite number of batches)