Minimax-optimal regret for single-index bandits when α>1 in increasing dimensions

Establish the minimax-optimal regret rate for single-index contextual bandits with single-index rewards in the increasing-dimensional regime under the generalized margin condition when the margin exponent α>1; specifically, determine whether the known upper bound Reg(π) = Õ(d) is optimal by proving a matching minimax lower bound or identifying the correct rate.

Background

The paper analyzes single-index contextual bandits and derives regret bounds that exhibit a phase transition between nonparametric and parametric regimes as the ambient dimension d grows. For β≥1 and α≤1, the authors present matching upper and lower bounds (up to logarithmic factors).

For α>1, the regret upper bound becomes Õ(d), reflecting the unavoidable learning cost in high-dimensional settings, but their lower bound does not address this regime. They note that optimality for α>1 is unresolved in the linear bandit literature, and correspondingly their single-index lower bound leaves α>1 open.

References

Similar to the linear bandit setting, our lower bound does not cover the regime α > 1 as the optimality in this case remains an open question in the linear bandit literature; see, for example, .

— Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity (2512.24669 - Ma et al., 31 Dec 2025) in Section 6.1 (Phase transition in increasing dimensions)

Minimax-optimal regret for single-index bandits when α>1 in increasing dimensions

Background

References

Related Problems