Convergence of the FB algorithm to ground-truth representations
Determine whether the forward–backward (FB) representation learning algorithm of Touati and Ollivier (2021), which jointly learns a forward representation F(s,a,z) and a backward representation B(s_f,a_f) to factorize the successor-measure ratio M^π(s_f,a_f | s,a,z)/ρ(s_f,a_f) and then derives a latent-conditioned policy π(a | s,z)=argmax_a F(s,a,z)^T z, converges to any ground-truth forward–backward representations F* and B* that exactly factorize this ratio and enable optimal policy adaptation for arbitrary reward functions.
References
Therefore, whether the FB algorithm converges to any ground-truth representations remains an open problem. Answering this question might require tools such as the Lefschetz fixed-point theorem~\citep{lefschetz1926intersections} or the Lyapunov stability~\citep{lyapunov1992general}, which we leave for future theoretical analysis.