Regret profile of PSRL with policy iteration (PSRL-PI)

Characterize the regret profile of the policy-iteration variant of Posterior Sampling for Reinforcement Learning (PSRL-PI).

Background

In the tabular experiments, the authors employ a policy-iteration variant of PSRL due to its strong empirical performance, referencing prior work. However, they point out that formal regret guarantees for this specific variant are not yet fully developed.

Clarifying the regret behavior of PSRL-PI would strengthen theoretical foundations for the chosen baseline and inform its comparative performance against distributional methods like DAIF.

References

We choose the policy iteration variant of the Posterior Sampling for Reinforcement Learning (PSRL) due to its competitive empirical performance, although its regret profile is not yet fully characterized.

Distributional Active Inference  (2601.20985 - Akgül et al., 28 Jan 2026) in Section 6: Experiments (footnote)