Papers
Topics
Authors
Recent
Search
2000 character limit reached

Latent Go-Explore (LGE) in Reinforcement Learning

Updated 15 February 2026
  • Latent Go-Explore (LGE) is a reinforcement learning approach that employs learned latent representations to overcome the limitations of hand-designed state discretizations.
  • It integrates density estimation, geometric goal sampling, and subgoal trajectory trimming to efficiently explore high-dimensional and sparse-reward environments.
  • Variants like Cell-Free LGE, LEAF, and Time-Myopic Go-Explore demonstrate improved sample efficiency, temporal separation, and scalability across complex domains.

Latent Go-Explore (LGE) refers to a class of reinforcement learning (RL) approaches that instantiate the Go-Explore paradigm within a learned latent space, addressing the cell partitioning bottleneck and enabling robust exploration in environments with sparse or deceptive rewards. LGE methods dispense with hand-defined state aggregation, instead leveraging learned state representations to characterize, return to, and expand the frontier of explored behavior, thereby generalizing Go-Explore to complex, high-dimensional domains.

1. Foundations and Motivation

The original Go-Explore algorithm [Ecoffet et al., 2019/2021] achieved state-of-the-art exploration—most notably on Montezuma’s Revenge—by (a) repeatedly returning to promising "cells" (discretizations of state space) before (b) robustly exploring outward from those frontier cells. However, hand-crafted cells are fundamentally limited: they depend on domain knowledge, risk conflating distinct states if too coarse, and can cause exploration failure if key factors of variation are omitted. These shortcomings motivate Latent Go-Explore, which eliminates explicit cell partitions in favor of learned latent representations, thereby making Go-Explore generalizable and robust across domains, including those with image observations and complex, high-dimensional state spaces (Gallouédec et al., 2022).

2. Latent State Representations and Encoders

LGE replaces explicit cell construction with embeddings learned from raw observations S\mathcal{S} into a latent space ZRd\mathcal{Z} \subset \mathbb{R}^d, where exploration and trajectory management are conducted. Several encoder architectures are deployed depending on the task and desired inductive bias:

  • Inverse-Dynamics Encoders: Train ϕθ\phi_\theta with an inverse model Pθinv\mathcal{P}^{\text{inv}}_\theta that predicts the action ata_t from ϕθ(st),ϕθ(st+1)\phi_\theta(s_t), \phi_\theta(s_{t+1}), with loss

Linv(θ)=E(st,at,st+1)D12atPθinv(ϕθ(st),ϕθ(st+1))22L_{\text{inv}}(\theta) = \mathbb{E}_{(s_t, a_t, s_{t+1}) \sim D} \frac{1}{2} \| a_t - \mathcal{P}^{\text{inv}}_\theta(\phi_\theta(s_t), \phi_\theta(s_{t+1})) \|_2^2

  • Forward-Dynamics Encoders: Pair ϕθ\phi_\theta with a dynamics predictor, optimized via

Lfwd(θ)=E(st,at,st+1)[logPθfwd(ϕθ(st+1)ϕθ(st),at)]L_{\text{fwd}}(\theta) = - \mathbb{E}_{(s_t, a_t, s_{t+1})} [\log \mathcal{P}^{\text{fwd}}_\theta(\phi_\theta(s_{t+1}) \mid \phi_\theta(s_t), a_t)]

  • VQ-VAE Encoders: Employ a vector-quantized variational autoencoder, with discrete code indices defining ϕ(s)\phi(s).

This flexibility allows the latent space to continuously adapt its topology to the controllable and novel aspects of the environment, a critical property for maintaining a meaningful exploration frontier (Gallouédec et al., 2022).

3. Exploration Workflow and Goal Selection

The central LGE workflow conducts exploration as follows:

  1. Density Estimation in Latent Space: Maintain a buffer DD of all visited states. Compute a kk-NN estimate of latent density

f^(zi)=knCd[D(k)(zi)]d\hat f(z_i) = \frac{k}{n C_d} [D_{(k)}(z_i)]^{-d}

where D(k)(zi)D_{(k)}(z_i) is the kk-th nearest neighbor distance.

  1. Geometric Goal Sampling: For each stored ziz_i, compute its rarity rank RiR_i. Sample a final goal G=siG = s_i with probability

Pr(G=si)=(1p)Ri1p\Pr(G=s_i) = (1-p)^{R_i-1} p

where pp is the geometric parameter favoring rare/novel states.

  1. Subgoal-Trajectory Trimming: For long trajectories, extract subgoals (g0,,gL)(g_0, \ldots, g_L) based on latent 2\ell_2 distance exceeding a threshold dd to ensure feasibility.
  2. Goal-Conditioned Rollouts and Exploration: Sequentially reach each subgoal, using a sparse goal-conditioned reward:

rt={0ϕ(st)ϕ(gi)<d 1otherwiser_t = \begin{cases} 0 & \|\phi(s_t) - \phi(g_i)\| < d \ -1 & \text{otherwise} \end{cases}

After reaching the final subgoal, random or heuristic exploratory actions are executed to further expand the frontier.

This procedure enables LGE to focus exploration at the boundary of current competence, extending the search efficiently and avoiding the inefficiencies of uniform or random goal selection (Gallouédec et al., 2022).

4. Variations and Extensions

Multiple operationalizations of Latent Go-Explore have been developed:

  • Cell-Free Latent Go-Explore (Gallouédec et al., 2022): Focuses on density-based sampling in latent space without reliance on cell abstraction.
  • LEAF (Latent Exploration Along the Frontier) (Bharadhwaj et al., 2020): Augments LGE with a learned, dynamics-aware reachability manifold and a binary reachability classifier Rω(zi,zj,δ)R_\omega(z_i, z_j, \delta), supporting precise frontier detection and a two-phase commit–explore cycle.
  • Time-Myopic Go-Explore (Höftmann et al., 2023): Utilizes a Siamese encoder Φθ\Phi_\theta and a time-prediction head Ψθ\Psi_\theta to define novelty via predicted temporal distance. New candidate archiving is governed by a threshold TdT_d on mincAΨθ(zc,zK)\min_{c \in A} \Psi_\theta(z_c, z_K), ensuring that discovered states are temporally distinct in the learned latent metric.
Variant Key Mechanism Notable Features
Cell-Free LGE (Gallouédec et al., 2022) k-NN density, geometric goal sampling, subgoal trimming No hand-crafted cells, adaptable encoders
LEAF (Bharadhwaj et al., 2020) Latent reachability, curriculum sampling, 2-phase planning Dynamics-aware manifold, deterministic frontier commitment
Time-Myopic Go-Explore (Höftmann et al., 2023) Temporal distance metric via learned time-predictor Novelty via time, resolves detachment/conflict

These variants preserve the Go-Explore intuition while eliminating its cell-design bottleneck and leveraging powerful latent abstractions.

5. Theoretical Analysis and Empirical Results

No formal sample complexity theorems or exhaustive coverage guarantees are provided, but empirical evidence demonstrates robust and scalable exploration. Key experimental findings include:

  • Coverage: LGE achieves near-complete exploration in challenging 2D mazes, matching or exceeding Go-Explore's cell-based implementation, and far surpassing random, intrinsic curiosity (ICM), and goal-based baselines on both robotic and Atari domains (Gallouédec et al., 2022).
  • Sample Efficiency: LEAF achieves 90% success on visual block-pushing in 1.2M steps (cf. 2.5–3.1M for top baselines), 85% success in door-opening by a 7-DoF arm in 1.8M steps, and full Ant-Maze coverage in 500k steps, outperforming established methods (Bharadhwaj et al., 2020).
  • Temporal Separation and Archive Management: Time-Myopic Go-Explore produces archive structures where states are uniformly temporally separated, preventing collision between semantically distinct states and resolving detachment (the loss of promising branches) via insertion-only archiving (Höftmann et al., 2023).
  • Ablations: Removing frontier mechanisms, reachability models, or non-uniform goal sampling degrades exploration speed and coverage, confirming the necessity of each element (Gallouédec et al., 2022, Bharadhwaj et al., 2020).

6. Implementation Guidelines and Challenges

Implementation of LGE frameworks includes:

  • Encoder Training: Periodic (e.g., every 5k or 500k steps) minibatch updates using chosen representation losses; e.g., inverse/forward dynamics, VQ-VAE, or time-prediction MSE.
  • Exploration Inertia: For high-entropy exploration, a high probability (e.g., 90%) of repeating previous actions during the random exploration phase.
  • Off-Policy Backup: Policies are typically updated using SAC or QR-DQN, often with Hindsight Experience Replay for efficient goal relabeling (Gallouédec et al., 2022).
  • Hyperparameters: Latent dimension (8–16), geometric-sampling pp ($0.01$–$0.05$), and trimming threshold dd are tuned per domain.

Noted implementation challenges include the linear scaling of novelty-query runtime with archive size and the need for robust, continuously updated encoders to avoid representation collapse. Potential remedies include approximate nearest neighbor search for archive lookup and joint training of policies and encoders (Höftmann et al., 2023).

7. Significance, Limitations, and Future Directions

LGE methods generalize Go-Explore’s success to domains where pixel-based cell design is impractical or fails, by leveraging adaptive, task-relevant latent structure. This shift enables state-of-the-art exploration in continuous control, visuomotor robotics, and hard exploration games, independent of extensive domain engineering (Gallouédec et al., 2022, Bharadhwaj et al., 2020, Höftmann et al., 2023).

Major limitations include the scaling of archive operations, representation drift as policies improve, and the lack of formal sample complexity bounds. Proposed future directions are:

  • Efficient archive lookup via hashing or tree structures.
  • Integrating contrastive or self-supervised losses for stronger generalization.
  • Joint, end-to-end training of representations and exploration policies.
  • Extending latent-based Go-Explore to multimodal and non-visual domains.

The LGE paradigm provides a principled and empirically validated foundation for scalable, cell-free deep exploration in high-dimensional RL, demonstrating resilience where cell-based techniques are brittle or intractable.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Latent Go-Explore (LGE).