Develop practical learning dynamics for the forward–backward TD-JEPA variant under relaxed assumptions

Develop and analyze practical, off-policy learnable training dynamics for the forward–backward-in-time latent-predictive TD-JEPA variant that relies on adjoint transition kernels, enabling optimization of the theoretically sound objective under relaxed assumptions where backward sampling is required.

Background

To relax symmetry and covariance assumptions, the paper derives a theoretically sound TD-JEPA formulation that requires both forward- and backward-in-time sampling via adjoint kernels. While this variant satisfies gradient-matching conditions that ensure optimization of a density-based objective, it is not straightforward to optimize off-policy because action-conditioned backward Bellman equations are not available.

The authors explicitly defer the development of practical learning procedures for this variant, identifying the need for implementable algorithms that preserve the theoretical guarantees without requiring impractical sampling operations.

References

We leave the study of practical learning dynamics for this theoretically sound variant of TD-JEPA for future work.

TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning  (2510.00739 - Bagatella et al., 1 Oct 2025) in Appendix: TD-JEPA with forward-backward-in-time sampling