Latent Go-Explore (LGE) in Reinforcement Learning

Updated 15 February 2026

Latent Go-Explore (LGE) is a reinforcement learning approach that employs learned latent representations to overcome the limitations of hand-designed state discretizations.
It integrates density estimation, geometric goal sampling, and subgoal trajectory trimming to efficiently explore high-dimensional and sparse-reward environments.
Variants like Cell-Free LGE, LEAF, and Time-Myopic Go-Explore demonstrate improved sample efficiency, temporal separation, and scalability across complex domains.

Latent Go-Explore (LGE) refers to a class of reinforcement learning (RL) approaches that instantiate the Go-Explore paradigm within a learned latent space, addressing the cell partitioning bottleneck and enabling robust exploration in environments with sparse or deceptive rewards. LGE methods dispense with hand-defined state aggregation, instead leveraging learned state representations to characterize, return to, and expand the frontier of explored behavior, thereby generalizing Go-Explore to complex, high-dimensional domains.

1. Foundations and Motivation

The original Go-Explore algorithm [Ecoffet et al., 2019/2021] achieved state-of-the-art exploration—most notably on Montezuma’s Revenge—by (a) repeatedly returning to promising "cells" (discretizations of state space) before (b) robustly exploring outward from those frontier cells. However, hand-crafted cells are fundamentally limited: they depend on domain knowledge, risk conflating distinct states if too coarse, and can cause exploration failure if key factors of variation are omitted. These shortcomings motivate Latent Go-Explore, which eliminates explicit cell partitions in favor of learned latent representations, thereby making Go-Explore generalizable and robust across domains, including those with image observations and complex, high-dimensional state spaces (Gallouédec et al., 2022).

2. Latent State Representations and Encoders

LGE replaces explicit cell construction with embeddings learned from raw observations $\mathcal{S}$ into a latent space $\mathcal{Z} \subset \mathbb{R}^d$ , where exploration and trajectory management are conducted. Several encoder architectures are deployed depending on the task and desired inductive bias:

Inverse-Dynamics Encoders: Train $\phi_\theta$ with an inverse model $\mathcal{P}^{\text{inv}}_\theta$ that predicts the action $a_t$ from $\phi_\theta(s_t), \phi_\theta(s_{t+1})$ , with loss

$L_{\text{inv}}(\theta) = \mathbb{E}_{(s_t, a_t, s_{t+1}) \sim D} \frac{1}{2} \| a_t - \mathcal{P}^{\text{inv}}_\theta(\phi_\theta(s_t), \phi_\theta(s_{t+1})) \|_2^2$

Forward-Dynamics Encoders: Pair $\phi_\theta$ with a dynamics predictor, optimized via

$L_{\text{fwd}}(\theta) = - \mathbb{E}_{(s_t, a_t, s_{t+1})} [\log \mathcal{P}^{\text{fwd}}_\theta(\phi_\theta(s_{t+1}) \mid \phi_\theta(s_t), a_t)]$

VQ-VAE Encoders: Employ a vector-quantized variational autoencoder, with discrete code indices defining $\phi(s)$ .

This flexibility allows the latent space to continuously adapt its topology to the controllable and novel aspects of the environment, a critical property for maintaining a meaningful exploration frontier (Gallouédec et al., 2022).

3. Exploration Workflow and Goal Selection

The central LGE workflow conducts exploration as follows:

Density Estimation in Latent Space: Maintain a buffer $D$ of all visited states. Compute a $k$ -NN estimate of latent density

$\hat f(z_i) = \frac{k}{n C_d} [D_{(k)}(z_i)]^{-d}$

where $D_{(k)}(z_i)$ is the $k$ -th nearest neighbor distance.

Geometric Goal Sampling: For each stored $z_i$ , compute its rarity rank $R_i$ . Sample a final goal $G = s_i$ with probability

$\Pr(G=s_i) = (1-p)^{R_i-1} p$

where $p$ is the geometric parameter favoring rare/novel states.

Subgoal-Trajectory Trimming: For long trajectories, extract subgoals $(g_0, \ldots, g_L)$ based on latent $\ell_2$ distance exceeding a threshold $d$ to ensure feasibility.
Goal-Conditioned Rollouts and Exploration: Sequentially reach each subgoal, using a sparse goal-conditioned reward:

$r_t = \begin{cases} 0 & \|\phi(s_t) - \phi(g_i)\| < d \ -1 & \text{otherwise} \end{cases}$

After reaching the final subgoal, random or heuristic exploratory actions are executed to further expand the frontier.

This procedure enables LGE to focus exploration at the boundary of current competence, extending the search efficiently and avoiding the inefficiencies of uniform or random goal selection (Gallouédec et al., 2022).

4. Variations and Extensions

Multiple operationalizations of Latent Go-Explore have been developed:

Cell-Free Latent Go-Explore (Gallouédec et al., 2022): Focuses on density-based sampling in latent space without reliance on cell abstraction.
LEAF (Latent Exploration Along the Frontier) (Bharadhwaj et al., 2020): Augments LGE with a learned, dynamics-aware reachability manifold and a binary reachability classifier $R_\omega(z_i, z_j, \delta)$ , supporting precise frontier detection and a two-phase commit–explore cycle.
Time-Myopic Go-Explore (Höftmann et al., 2023): Utilizes a Siamese encoder $\Phi_\theta$ and a time-prediction head $\Psi_\theta$ to define novelty via predicted temporal distance. New candidate archiving is governed by a threshold $T_d$ on $\min_{c \in A} \Psi_\theta(z_c, z_K)$ , ensuring that discovered states are temporally distinct in the learned latent metric.

Variant	Key Mechanism	Notable Features
Cell-Free LGE (Gallouédec et al., 2022)	k-NN density, geometric goal sampling, subgoal trimming	No hand-crafted cells, adaptable encoders
LEAF (Bharadhwaj et al., 2020)	Latent reachability, curriculum sampling, 2-phase planning	Dynamics-aware manifold, deterministic frontier commitment
Time-Myopic Go-Explore (Höftmann et al., 2023)	Temporal distance metric via learned time-predictor	Novelty via time, resolves detachment/conflict

These variants preserve the Go-Explore intuition while eliminating its cell-design bottleneck and leveraging powerful latent abstractions.

5. Theoretical Analysis and Empirical Results

No formal sample complexity theorems or exhaustive coverage guarantees are provided, but empirical evidence demonstrates robust and scalable exploration. Key experimental findings include:

Coverage: LGE achieves near-complete exploration in challenging 2D mazes, matching or exceeding Go-Explore's cell-based implementation, and far surpassing random, intrinsic curiosity (ICM), and goal-based baselines on both robotic and Atari domains (Gallouédec et al., 2022).
Sample Efficiency: LEAF achieves 90% success on visual block-pushing in 1.2M steps (cf. 2.5–3.1M for top baselines), 85% success in door-opening by a 7-DoF arm in 1.8M steps, and full Ant-Maze coverage in 500k steps, outperforming established methods (Bharadhwaj et al., 2020).
Temporal Separation and Archive Management: Time-Myopic Go-Explore produces archive structures where states are uniformly temporally separated, preventing collision between semantically distinct states and resolving detachment (the loss of promising branches) via insertion-only archiving (Höftmann et al., 2023).
Ablations: Removing frontier mechanisms, reachability models, or non-uniform goal sampling degrades exploration speed and coverage, confirming the necessity of each element (Gallouédec et al., 2022, Bharadhwaj et al., 2020).

6. Implementation Guidelines and Challenges

Implementation of LGE frameworks includes:

Encoder Training: Periodic (e.g., every 5k or 500k steps) minibatch updates using chosen representation losses; e.g., inverse/forward dynamics, VQ-VAE, or time-prediction MSE.
Exploration Inertia: For high-entropy exploration, a high probability (e.g., 90%) of repeating previous actions during the random exploration phase.
Off-Policy Backup: Policies are typically updated using SAC or QR-DQN, often with Hindsight Experience Replay for efficient goal relabeling (Gallouédec et al., 2022).
Hyperparameters: Latent dimension (8–16), geometric-sampling $p$ ($0.01$–$0.05$), and trimming threshold $d$ are tuned per domain.

Noted implementation challenges include the linear scaling of novelty-query runtime with archive size and the need for robust, continuously updated encoders to avoid representation collapse. Potential remedies include approximate nearest neighbor search for archive lookup and joint training of policies and encoders (Höftmann et al., 2023).

7. Significance, Limitations, and Future Directions

LGE methods generalize Go-Explore’s success to domains where pixel-based cell design is impractical or fails, by leveraging adaptive, task-relevant latent structure. This shift enables state-of-the-art exploration in continuous control, visuomotor robotics, and hard exploration games, independent of extensive domain engineering (Gallouédec et al., 2022, Bharadhwaj et al., 2020, Höftmann et al., 2023).

Major limitations include the scaling of archive operations, representation drift as policies improve, and the lack of formal sample complexity bounds. Proposed future directions are:

Efficient archive lookup via hashing or tree structures.
Integrating contrastive or self-supervised losses for stronger generalization.
Joint, end-to-end training of representations and exploration policies.
Extending latent-based Go-Explore to multimodal and non-visual domains.

The LGE paradigm provides a principled and empirically validated foundation for scalable, cell-free deep exploration in high-dimensional RL, demonstrating resilience where cell-based techniques are brittle or intractable.

Markdown Report Issue Upgrade to Chat

References (3)

Cell-Free Latent Go-Explore (2022)

LEAF: Latent Exploration Along the Frontier (2020)

Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Latent Go-Explore (LGE).

Latent Go-Explore (LGE) in Reinforcement Learning

1. Foundations and Motivation

2. Latent State Representations and Encoders

3. Exploration Workflow and Goal Selection

4. Variations and Extensions

5. Theoretical Analysis and Empirical Results

6. Implementation Guidelines and Challenges

7. Significance, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Latent Go-Explore (LGE) in Reinforcement Learning

1. Foundations and Motivation

2. Latent State Representations and Encoders

3. Exploration Workflow and Goal Selection

4. Variations and Extensions

5. Theoretical Analysis and Empirical Results

6. Implementation Guidelines and Challenges

7. Significance, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research