Papers
Topics
Authors
Recent
Search
2000 character limit reached

Prior-Aligned Meta-RL: Thompson Sampling with Learned Priors and Guarantees in Finite-Horizon MDPs

Published 6 Oct 2025 in cs.LG and stat.ML | (2510.05446v1)

Abstract: We study meta-reinforcement learning in finite-horizon MDPs where related tasks share similar structures in their optimal action-value functions. Specifically, we posit a linear representation $Q*_h(s,a)=\Phi_h(s,a)\,\theta{(k)}_h$ and place a Gaussian meta-prior $ \mathcal{N}(\theta_h,\Sigma^_h)$ over the task-specific parameters $\theta{(k)}_h$. Building on randomized value functions, we propose two Thompson-style algorithms: (i) MTSRL, which learns only the prior mean and performs posterior sampling with the learned mean and known covariance; and (ii) $\text{MTSRL}{+}$, which additionally estimates the covariance and employs prior widening to control finite-sample estimation error. Further, we develop a prior-alignment technique that couples the posterior under the learned prior with a meta-oracle that knows the true prior, yielding meta-regret guarantees: we match prior-independent Thompson sampling in the small-task regime and strictly improve with more tasks once the prior is learned. Concretely, for known covariance we obtain $\tilde{O}(H{4}S{3/2}\sqrt{ANK})$ meta-regret, and with learned covariance $\tilde{O}(H{4}S{3/2}\sqrt{AN3K})$; both recover a better behavior than prior-independent after $K \gtrsim \tilde{O}(H2)$ and $K \gtrsim \tilde{O}(N2H2)$, respectively. Simulations on a stateful recommendation environment (with feature and prior misspecification) show that after brief exploration, MTSRL/MTSRL(+) track the meta-oracle and substantially outperform prior-independent RL and bandit-only meta-baselines. Our results give the first meta-regret guarantees for Thompson-style RL with learned Q-priors, and provide practical recipes (warm-start via RLSVI, OLS aggregation, covariance widening) for experiment-rich settings.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.