Minimum mean-squared error estimation with bandit feedback
Abstract: We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. We propose two MSE estimators, and analyze their concentration properties. The first estimator is non-adaptive, as it is tied to a predetermined $m$-subset and lacks the flexibility to transition to alternative subsets. The second estimator, which is derived using a regression framework, is adaptive and exhibits better concentration bounds in comparison to the first estimator. We frame the MSE estimation problem with bandit feedback, where the objective is to find the MSE-optimal subset with high confidence. We propose a variant of the successive elimination algorithm to solve this problem. We also derive a minimax lower bound to understand the fundamental limit on the sample complexity of this problem.
- Correlated bandits or: How to minimize mean-squared error online, in: International Conference on Machine Learning, PMLR. pp. 686–694.
- Optimal rates of convergence for covariance matrix estimation. The Annals of Statistics 38, 2118–2144.
- Trading off rewards and errors in multi-armed bandits, in: International Conference on Artificial Intelligence and Statistics, PMLR. pp. 709–717.
- Pac bounds for multi-armed bandit and markov decision processes, in: International Conference on Computational Learning Theory, Springer. pp. 255–270.
- Near-optimal sensor placements in gaussian processes, in: International conference on Machine learning, pp. 265–272.
- Correlated multi-armed bandits with a latent random source, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. pp. 3572–3576.
- Notes for ECE 534: an exploration of random processes for engineers. Univ. of Illinois at Urbana–Champaign .
- On the complexity of best arm identification in multi-armed bandit models. The Journal of Machine Learning Research .
- Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks. Journal of Water Resources Planning and Management 134, 516–526.
- Bandit algorithms. Cambridge University Press.
- Most correlated arms identification, in: Confernce on Learning Theory, pp. 623–637.
- Learning probabilistic models of cellular network traffic with applications to resource management, in: IEEE International Symposium on Dynamic Spectrum Access Networks, pp. 82–91.
- High dimensional statistics. Lecture notes for course 18S997 .
- Adaptive estimation of random vectors with bandit feedback, in: Indian Control Conference, pp. 1–2.
- High-dimensional statistics: A non-asymptotic viewpoint. Cambridge University Press.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.