Regret profile of PSRL with policy iteration (PSRL-PI)
Characterize the regret profile of the policy-iteration variant of Posterior Sampling for Reinforcement Learning (PSRL-PI).
References
We choose the policy iteration variant of the Posterior Sampling for Reinforcement Learning (PSRL) due to its competitive empirical performance, although its regret profile is not yet fully characterized.
— Distributional Active Inference
(2601.20985 - Akgül et al., 28 Jan 2026) in Section 6: Experiments (footnote)