R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

Published 18 Mar 2026 in cs.LG, cs.AI, and cs.RO | (2603.18202v1)

Abstract: A central challenge in image-based Model-Based Reinforcement Learning (MBRL) is to learn representations that distill essential information from irrelevant visual details. While promising, reconstruction-based methods often waste capacity on large task-irrelevant regions. Decoder-free methods instead learn robust representations by leveraging Data Augmentation (DA), but reliance on such external regularizers limits versatility. We propose R2-Dreamer, a decoder-free MBRL framework with a self-supervised objective that serves as an internal regularizer, preventing representation collapse without resorting to DA. The core of our method is a redundancy-reduction objective inspired by Barlow Twins, which can be easily integrated into existing frameworks. On DeepMind Control Suite and Meta-World, R2-Dreamer is competitive with strong baselines such as DreamerV3 and TD-MPC2 while training 1.59x faster than DreamerV3, and yields substantial gains on DMC-Subtle with tiny task-relevant objects. These results suggest that an effective internal regularizer can enable versatile, high-performance decoder-free MBRL. Code is available at https://github.com/NM512/r2dreamer.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a redundancy-reduced world model that eliminates decoders and data augmentation by using a Barlow Twins-inspired self-supervised loss.
It achieves robust performance and computational speedups across benchmarks like DMC and Meta-World, excelling at tasks with subtle visual cues.
Detailed ablations demonstrate that internal redundancy reduction prevents representation collapse and focuses on precisely relevant task features.

R2-Dreamer: Redundancy-Reduced World Models without Decoders or Augmentation

Motivation and Problem Statement

R2-Dreamer addresses the critical inefficiency in representation learning for Model-Based Reinforcement Learning (MBRL) from high-dimensional visual observations. Existing world models predominantly utilize pixel-level reconstruction objectives (as in DreamerV3) that force latent representations to capture all visual information, including large, task-irrelevant regions. This not only misallocates representational capacity but also creates a computational bottleneck due to the need for an expensive decoder.

Decoder-free methods such as DreamerPro and TD-MPC2 circumvent reconstruction but introduce a dependency on Data Augmentation (DA) as an external regularizer to avoid representation collapse. However, DA is a heuristic that can degrade performance when augmentations (e.g., random shifts, color jittering) inadvertently remove or distort task-relevant features, especially when precise observation of small objects is required.

Method: Internal Redundancy-Reduction for Representation Learning

R2-Dreamer introduces a new theoretical and practical paradigm for representation learning in RSSM-based world models. Its key innovation is replacing both the decoder and DA with a strictly internal redundancy-reduction objective, inspired by the Barlow Twins self-supervised loss. The implementation detaches the expensive reconstruction step and does not require positive pairs from DA, instead leveraging naturally arising pairs: the latent state (after a linear projection) and the corresponding image encoder embedding.

The world model’s core loss comprises prediction and Kullback–Leibler regularization terms (retained from DreamerV3), with the reconstruction term swapped for the Barlow Twins objective:

Invariance term: Encourages the projected state and image embedding to be maximally correlated along matching dimensions.
Redundancy-reduction term: Penalizes cross-correlation among different dimensions, enforcing decorrelation (i.e., statistical independence) and preventing collapse.

All other architectural components and the actor-critic training regime from DreamerV3 are maintained, enabling controlled, fair comparisons.

Experimental Evaluation

R2-Dreamer is extensively benchmarked across:

DeepMind Control Suite (DMC): 20 continuous control tasks with diverse visual and dynamic characteristics.
Meta-World MT1: 50 fine-grained robotic manipulation tasks.
DMC-Subtle: A curated benchmark targeting representation sensitivity by dramatically shrinking task-relevant objects, explicitly stress testing the limits of auxiliary and regularization-based approaches.

Overall Performance

On DMC and Meta-World, R2-Dreamer matches or exceeds the performance of DreamerV3 and leading decoder-free (DA-based) methods (TD-MPC2, DreamerPro), vis-a-vis mean and median scores.
On DMC-Subtle, R2-Dreamer demonstrates substantial performance gains, maintaining precise focus on tiny targets that other methods—especially those relying on DA—fail to isolate. This supports the hypothesis that heuristic augmentation can systematically destroy finely localized, task-critical cues.

Representation Analysis

Saliency-based visualization shows that R2-Dreamer’s learned policies attend directly to relevant objects, in contrast to baselines whose focus is diffused due to background overfitting or DA-induced feature loss.

Ablations

R2-Dreamer is robust to batch size reduction and does not benefit from DA; in fact, explicit augmentation degrades performance on tasks where spatial details determine success.
Competing methods without DA experience sharp performance collapse, underscoring R2-Dreamer’s unique internal regularization capability.

Computational Efficiency

Removing the decoder and augmentation pipelines yields systematic speedups (1.59× vs DreamerV3; 2.36× vs DreamerPro), enabling more rapid experimentation and scaling.

Theoretical Connections

The redundancy-reduction loss implements a variational approximation to an extended Sequential Information Bottleneck (SIB) objective. The invariance and redundancy-reduction components correspond to maximizing mutual information between the latent space and observations (fidelity), and minimizing total correlation among latent features (spatial compression), respectively. This formulation implicitly prioritizes informationally efficient and disentangled representations, in line with principled objectives from information theory.

Implications and Future Directions

R2-Dreamer demonstrates that internal, information-theoretic regularization can supplant both supervised (reconstruction-based) and augmentation-dependent objectives in visual MBRL. This has broad ramifications:

Practical: It removes the necessity for domain-specific DA heuristics, streamlines training pipelines, and supports deployment in real-world environments where the preservation of all visual details may be critical and external augmentations are a liability.
Theoretical: Offers a generalizable, principled mechanism for regularization, directly aligned with compression and disentanglement goals, and is readily compatible with scaling to complex tasks.
Future work: Extending to robustness assessment under dynamic distractors (e.g., Distracting Control Suite), application to large-scale and high-dimensional visuomotor tasks, and integrating with more expressive sequence or transformer-based dynamics models.

Conclusion

R2-Dreamer establishes that redundancy-reduction, as a self-supervised, DA-free, and decoder-free objective, is sufficient for robust, scalable, and information-efficient representation learning in world models. This approach advances MBRL towards architectures that are less reliant on heuristic regularization and more aligned with the intrinsic structure of visuomotor decision making, offering immediate practical benefits and opening new theoretical lines in the efficient use of visual information for control (2603.18202).

Markdown Report Issue