Minimax-optimal regret for single-index bandits when α>1 in increasing dimensions
Establish the minimax-optimal regret rate for single-index contextual bandits with single-index rewards in the increasing-dimensional regime under the generalized margin condition when the margin exponent α>1; specifically, determine whether the known upper bound Reg(π) = Õ(d) is optimal by proving a matching minimax lower bound or identifying the correct rate.
References
Similar to the linear bandit setting, our lower bound does not cover the regime α > 1 as the optimality in this case remains an open question in the linear bandit literature; see, for example, .
— Nonparametric Bandits with Single-Index Rewards: Optimality and Adaptivity
(2512.24669 - Ma et al., 31 Dec 2025) in Section 6.1 (Phase transition in increasing dimensions)