Convergence of the FB algorithm to ground-truth representations

Determine whether the forward–backward (FB) representation learning algorithm of Touati and Ollivier (2021), which jointly learns a forward representation F(s,a,z) and a backward representation B(s_f,a_f) to factorize the successor-measure ratio M^π(s_f,a_f | s,a,z)/ρ(s_f,a_f) and then derives a latent-conditioned policy π(a | s,z)=argmax_a F(s,a,z)^T z, converges to any ground-truth forward–backward representations F* and B* that exactly factorize this ratio and enable optimal policy adaptation for arbitrary reward functions.

Background

The paper introduces an FB Bellman operator associated with the forward–backward representation learning procedure and shows it is not a γ-contraction, so standard Banach fixed-point arguments do not apply. This non-contractive behavior stems from a circular dependency between the learned policy and its successor measures.

The authors note that multiple fixed points exist for the FB Bellman operator and that standard analysis does not establish whether the practical FB algorithm converges to ground-truth representations that both factorize the successor measure ratio and yield optimal control for any reward.

They suggest that alternative analytical tools (e.g., Lefschetz fixed-point theory or Lyapunov stability) might be needed to resolve the convergence question, motivating their simpler one-step FB method with better empirical convergence.

References

Therefore, whether the FB algorithm converges to any ground-truth representations remains an open problem. Answering this question might require tools such as the Lefschetz fixed-point theorem~\citep{lefschetz1926intersections} or the Lyapunov stability~\citep{lyapunov1992general}, which we leave for future theoretical analysis.

Can We Really Learn One Representation to Optimize All Rewards?  (2602.11399 - Zheng et al., 11 Feb 2026) in Section 3.3 (Does the Practical FB Algorithm Converge to Ground-Truth Representations?)