Learning efficient exploration strategies from experience for Best Arm Identification

Determine whether efficient exploration strategies for Best Arm Identification in multi-armed bandit problems can be learned directly from experience, thereby avoiding the explicit design of instance-dependent Best Arm Identification algorithms.

Background

The paper highlights that designing Best Arm Identification (BAI) algorithms is highly problem-specific and sensitive to modeling assumptions, making it difficult to develop efficient techniques for complex settings such as Markov Decision Processes. In response, the authors introduce an in-context learning approach using Transformers to discover exploration strategies directly from experience, hypothesizing that such models can autonomously exploit shared latent structure across tasks.

This open question concerns the feasibility of learning BAI strategies end-to-end without hand-crafted algorithmic design, which would offer a practical and adaptable alternative to classical, assumption-heavy methods. The authors position their approach as an attempt to address this question empirically across deterministic, stochastic, and structured environments.

References

Therefore, in this work we address the open question of whether it is possible to learn efficient exploration strategies directly from experience, avoiding the process of designing a BAI algorithm.

Learning to Explore: An In-Context Learning Approach for Pure Exploration  (2506.01876 - Russo et al., 2 Jun 2025) in Section 2, Learning to Explore: In-Context Pure Exploration (paragraph on Best-Arm Identification)