INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL

Published 18 Apr 2022 in cs.RO, cs.AI, and cs.LG | (2204.08585v1)

Abstract: Model-based reinforcement learning (RL) algorithms designed for handling complex visual observations typically learn some sort of latent state representation, either explicitly or implicitly. Standard methods of this sort do not distinguish between functionally relevant aspects of the state and irrelevant distractors, instead aiming to represent all available information equally. We propose a modified objective for model-based RL that, in combination with mutual information maximization, allows us to learn representations and dynamics for visual model-based RL without reconstruction in a way that explicitly prioritizes functionally relevant factors. The key principle behind our design is to integrate a term inspired by variational empowerment into a state-space model based on mutual information. This term prioritizes information that is correlated with action, thus ensuring that functionally relevant factors are captured first. Furthermore, the same empowerment term also promotes faster exploration during the RL process, especially for sparse-reward tasks where the reward signal is insufficient to drive exploration in the early stages of learning. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds, and show that the proposed prioritized information objective outperforms state-of-the-art model based RL approaches with higher sample efficiency and episodic returns. https://sites.google.com/view/information-empowerment

Abstract PDF Upgrade to Chat

Citations (28)

View on Semantic Scholar

Summary

The paper presents InfoPower, a method that integrates variational empowerment with mutual information maximization to enhance exploration in visual model-based RL.
The approach leverages contrastive learning and a primal-dual optimization framework to robustly capture latent state representations and prioritize controllable factors.
Experimental evaluations demonstrate that InfoPower outperforms state-of-the-art baselines in distractor-heavy settings by effectively filtering irrelevant information.

Information Prioritization Through Empowerment in Visual Model-Based RL

The paper presents InfoPower, an approach to model-based reinforcement learning (MBRL) that integrates variational empowerment with mutual information maximization to improve visual MBRL performance. The method explicitly prioritizes functionally relevant information, achieved through an empowerment-enhanced mutual information-based non-reconstructive framework. This novel objective enables efficient learning in environments with confounding distractions while enhancing exploration, especially under sparse reward conditions.

Model Architecture and Objectives

InfoPower distinguishes itself by incorporating empowerment into both representation and policy learning objectives. A contrastive learning approach captures latent state representations without reconstructing high-dimensional observations, ensuring robustness to irrelevant distractors. Empowerment prioritizes controllable factors, and the representation learning objective enforces this prioritization to learn latent state-space models effectively.

Figure 1: Overview of InfoPower. $I(_t;Z_t)$ is the contrastive learning objective for learning an encoder to map from image $%%%%1%%%%Z$ .

Learning Controllable Factors: The empowerment term $I(A_{t-1};Z_t|Z_{t-1})$ serves as a crucial pillar, guiding the agent towards controllable state configurations. This term ensures that the representations $Z$ prioritize actions with significant effects on future states, promoting exploratory behavior in sparsely rewarding environments.

Implementation Strategy

The implementation of InfoPower requires the optimization of several objectives using lower bounds to mutual information (MI). The core MI terms for contrastive learning are evaluated either through an InfoNCE or NWJ lower bound, favoring the NWJ due to its slight performance improvement in practice.

Primal-Dual Optimization

The constrained optimization captures a hierarchical learning structure where the MI between observations and latents is maximized, subject to constraints emphasizing controllability. The Lagrangian method optimizes primal and dual variables for efficient convergence:

initialize_parameters()
while not_converged:
    update_primal_parameters()
    update_dual_variables()
    perform_policy_update()
    interact_with_environment()

Experimental Evaluation

Performance on Distractor-Heavy Environments

InfoPower demonstrates exceptional performance when evaluated on deep reinforcement learning (RL) benchmarks with distraction-heavy environments, outperforming state-of-the-art baselines such as Dreamer, TIA, and others (Figure 2).

Figure 2: Evaluation of InfoPower and baselines in a suite of DeepMind Control tasks with natural video distractors in the background.

Behavioral Similarity and Exploration

The empirical validation includes measuring the behavioral similarity between learned latents and true simulator states using the proposed metric and t-SNE visualizations. InfoPower's latent representations highly correlated with true states, showcasing its effectiveness in retaining crucial task information while discarding irrelevant distractions.

Ablation Studies

Ablative analysis highlights the indispensable role of the empowerment objective across different stages of representation and policy learning. Variations excluding empowerment significantly reduce performance, particularly in initial training phases where exploration is critical (Figure 3).

Figure 3: Evaluation of InfoPower and ablated variants in a suite of DeepMind Control tasks with natural video distractors in the background.

Conclusion

InfoPower is a promising approach in visual model-based reinforcement learning with intrinsic prioritization for functionally relevant information. Its empowerment-driven framework facilitates efficient exploration and exploitation in complex visual environments, achieving superior performance to existing RL methods under challenging distractor settings. The research indicates potential future applications in broader RL scenarios, where understanding and prioritizing relevant information can significantly impact the effectiveness of learned policies.