- The paper introduces a redundancy-reduced world model that eliminates decoders and data augmentation by using a Barlow Twins-inspired self-supervised loss.
- It achieves robust performance and computational speedups across benchmarks like DMC and Meta-World, excelling at tasks with subtle visual cues.
- Detailed ablations demonstrate that internal redundancy reduction prevents representation collapse and focuses on precisely relevant task features.
R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation
Motivation and Problem Statement
R2-Dreamer addresses the critical inefficiency in representation learning for Model-Based Reinforcement Learning (MBRL) from high-dimensional visual observations. Existing world models predominantly utilize pixel-level reconstruction objectives (as in DreamerV3) that force latent representations to capture all visual information, including large, task-irrelevant regions. This not only misallocates representational capacity but also creates a computational bottleneck due to the need for an expensive decoder.
Decoder-free methods such as DreamerPro and TD-MPC2 circumvent reconstruction but introduce a dependency on Data Augmentation (DA) as an external regularizer to avoid representation collapse. However, DA is a heuristic that can degrade performance when augmentations (e.g., random shifts, color jittering) inadvertently remove or distort task-relevant features, especially when precise observation of small objects is required.
Method: Internal Redundancy-Reduction for Representation Learning
R2-Dreamer introduces a new theoretical and practical paradigm for representation learning in RSSM-based world models. Its key innovation is replacing both the decoder and DA with a strictly internal redundancy-reduction objective, inspired by the Barlow Twins self-supervised loss. The implementation detaches the expensive reconstruction step and does not require positive pairs from DA, instead leveraging naturally arising pairs: the latent state (after a linear projection) and the corresponding image encoder embedding.
The world modelโs core loss comprises prediction and KullbackโLeibler regularization terms (retained from DreamerV3), with the reconstruction term swapped for the Barlow Twins objective:
- Invariance term: Encourages the projected state and image embedding to be maximally correlated along matching dimensions.
- Redundancy-reduction term: Penalizes cross-correlation among different dimensions, enforcing decorrelation (i.e., statistical independence) and preventing collapse.
All other architectural components and the actor-critic training regime from DreamerV3 are maintained, enabling controlled, fair comparisons.
Experimental Evaluation
R2-Dreamer is extensively benchmarked across:
- DeepMind Control Suite (DMC): 20 continuous control tasks with diverse visual and dynamic characteristics.
- Meta-World MT1: 50 fine-grained robotic manipulation tasks.
- DMC-Subtle: A curated benchmark targeting representation sensitivity by dramatically shrinking task-relevant objects, explicitly stress testing the limits of auxiliary and regularization-based approaches.
- On DMC and Meta-World, R2-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free (DA-based) methods (TD-MPC2, DreamerPro), vis-a-vis mean and median scores.
- On DMC-Subtle, R2-Dreamer demonstrates substantial performance gains, maintaining precise focus on tiny targets that other methodsโespecially those relying on DAโfail to isolate. This supports the hypothesis that heuristic augmentation can systematically destroy finely localized, task-critical cues.
Representation Analysis
Saliency-based visualization shows that R2-Dreamerโs learned policies attend directly to relevant objects, in contrast to baselines whose focus is diffused due to background overfitting or DA-induced feature loss.
Ablations
- R2-Dreamer is robust to batch size reduction and does not benefit from DA; in fact, explicit augmentation degrades performance on tasks where spatial details determine success.
- Competing methods without DA experience sharp performance collapse, underscoring R2-Dreamerโs unique internal regularization capability.
Computational Efficiency
Removing the decoder and augmentation pipelines yields systematic speedups (1.59ร vs DreamerV3; 2.36ร vs DreamerPro), enabling more rapid experimentation and scaling.
Theoretical Connections
The redundancy-reduction loss implements a variational approximation to an extended Sequential Information Bottleneck (SIB) objective. The invariance and redundancy-reduction components correspond to maximizing mutual information between the latent space and observations (fidelity), and minimizing total correlation among latent features (spatial compression), respectively. This formulation implicitly prioritizes informationally efficient and disentangled representations, in line with principled objectives from information theory.
Implications and Future Directions
R2-Dreamer demonstrates that internal, information-theoretic regularization can supplant both supervised (reconstruction-based) and augmentation-dependent objectives in visual MBRL. This has broad ramifications:
- Practical: It removes the necessity for domain-specific DA heuristics, streamlines training pipelines, and supports deployment in real-world environments where the preservation of all visual details may be critical and external augmentations are a liability.
- Theoretical: Offers a generalizable, principled mechanism for regularization, directly aligned with compression and disentanglement goals, and is readily compatible with scaling to complex tasks.
- Future work: Extending to robustness assessment under dynamic distractors (e.g., Distracting Control Suite), application to large-scale and high-dimensional visuomotor tasks, and integrating with more expressive sequence or transformer-based dynamics models.
Conclusion
R2-Dreamer establishes that redundancy-reduction, as a self-supervised, DA-free, and decoder-free objective, is sufficient for robust, scalable, and information-efficient representation learning in world models. This approach advances MBRL towards architectures that are less reliant on heuristic regularization and more aligned with the intrinsic structure of visuomotor decision making, offering immediate practical benefits and opening new theoretical lines in the efficient use of visual information for control (2603.18202).