Mapless Global Navigation Approach

Updated 8 February 2026

Mapless global navigation is a set of algorithms that enables robots to navigate long distances using only local sensory inputs and minimal goal descriptors without a global map.
The approach leverages deep learning techniques, including policy gradients, CVAEs, and diffusion models, to generate safe and efficient global trajectories.
Key challenges include long-horizon planning in cluttered, dynamic settings and achieving robust sim-to-real transfer under unpredictable environmental conditions.

Mapless global navigation refers to the class of algorithms and architectures that enable robotic platforms to perform long-range navigation in unstructured or unknown environments without constructing or relying on an explicit global metric map (e.g., occupancy grid, pose-graph). Instead, these approaches learn or infer navigation behavior directly from raw, local sensory inputs (such as LiDAR, depth, or visual data) and minimal goal specifications (typically relative pose in the local frame), producing control outputs or global-scale trajectories that successfully drive the robot from its current state to a distant goal while avoiding obstacles and respecting terrain or traversability constraints. Mapless global navigation is especially relevant for domains where reliable mapping or localization is infeasible due to environmental unpredictability, resource limitations, or sensor ambiguity, such as planetary exploration, GPS-denied environments, and dynamic crowds.

1. Foundations and Problem Formulation

Mapless global navigation is formalized as a Markov Decision Process (MDP) or, in partially observable settings, as a POMDP:

State space (S): Encodes the robot’s local exteroceptive observations—such as raw or preprocessed LiDAR/visual scans, proprioceptive history (previous actions, odometry), and estimates of distance and heading to the goal.
Action space (A): Typically continuous control inputs (e.g., linear and angular velocity for Ackermann or differential drive, or $\mathbb{R}^3$ commands for UAVs/HUAUVs), sometimes discretized for tractability or hardware constraints.
Transition model (P): Induced by the robot’s kinematics and, where applicable, environment dynamics (obstacle or geometry, often unmodeled in mapless approaches).
Reward function (R): Designed to encourage progress toward the goal and penalize collisions, unsafe behaviors (backtracking, sharp oscillations), and unfavorable control characteristics.
Discount factor ( $\gamma$ ): Usually $\approx 0.99$ for non-myopic planning.

Critically, the observation space includes only local perceptual fields (e.g., heightmaps, point clouds, recent sensor windows) and does not include a persistent global topological or metric map of the environment. The agent must infer navigable structure globally from local, sequential data and goal descriptors (Mortensen et al., 2023, Tai et al., 2017, Liang et al., 2024, Grando et al., 2022).

In multi-robot contexts, decentralized local policies act on their own partial observations, with centralized training possibly incorporating global context (Marchesini et al., 2021).

2. Core Methodological Approaches

Deep Model-Free Policy Learning

Deep RL models learn direct mappings from local sensory and goal inputs to actions by maximizing expected return. Key approaches include:

On-policy policy gradients (PPO, A3C, CPO): Used for robots in complex, cluttered, or dynamic environments—sometimes extended with intrinsic curiosity-driven objectives to promote exploration without global maps (Zhelo et al., 2018).
Off-policy deterministic/stochastic actor-critic (DDPG, TD3, SAC): Used for continuous control with sparse range inputs, sometimes augmented by double critics and recurrent encoders (LSTM/GRU) to filter sensor noise or perform implicit memory, supporting robust 3D mapless navigation for UAVs/HUAUVs (Grando et al., 2021, Grando et al., 2022, Grando et al., 2021).
Teacher-student or “learning by cheating” distillation: Train a teacher in privileged, simulation settings with perfect sensors and then distill its behavior into a student that learns to act robustly from noisy, real-world data. The student network typically incorporates temporal filtering (GRU belief encoder) to compensate for sensor noise or partial observability (Mortensen et al., 2023).

End-to-End Generative Trajectory Planners

Instead of producing immediate control, some frameworks generate global-scale trajectories on-the-fly:

Conditional Variational Autoencoders (CVAE): MTG and Sem-NaVAE employ CVAEs to generate diverse, traversability-constrained global paths, sampling from learned priors over plausible trajectories given current local sensory context. The best path is selected via geometric or semantic scoring and then executed by a local controller (Liang et al., 2023, Olguin et al., 1 Feb 2026).
Conditional Diffusion Models: DTG replaces the CVAE decoder with a conditional RNN-based denoising diffusion process to efficiently generate optimal and safe global trajectories, with additional losses to ensure traversability (Liang et al., 2024).
Cost Learning and Semantic Guidance: Methods like CREStE blend Internet-scale vision foundation model (VFM) priors for perception, learn reward/cost maps with counterfactual IRL, and infer global navigation costs that generalize to open-set semantics and environmental structure (Zhang et al., 5 Mar 2025). Sem-NaVAE further integrates open-vocabulary VLMs (e.g., CLIPSeg) to score and select trajectories dynamically based on semantic segmentation (Olguin et al., 1 Feb 2026).

Hierarchical and Curriculum Methods

Hierarchical RL (HRL): Decomposes mapless global navigation into high-level sub-goal proposal (over discretized local sectors, congestion-aware, often with Double DQN and HER) and low-level motion planning (safe RL—e.g., CPO—on immediate controls), allowing efficient resolution of local minima or congested areas (Gao et al., 15 Mar 2025). Congestion estimation from lidar is used to adapt sub-goal update frequency.
Adaptive Curriculum RL: Curriculum-based scheduling of start-goal pairs in training (e.g., NavACL-Q) accelerates learning for sparse-reward or difficult domains (e.g., warehouses), focusing exploration on the current agent’s “frontier of competence” (Xue et al., 2022).

Biological and Minimalist Response-based Strategies

Reactive visual-driven navigation: Demonstrates that minimal convolutional networks can evolve robust “global” navigation strategies (e.g., indirect sequential, biased diffusive, direct pathing) purely from local ray-cast vision, without maps, memory, or explicit path integration. These approaches highlight the sufficiency and diversity of mapless strategies achievable with lightweight architectures and have direct biological analogues (Govoni et al., 2024).

3. Sim-to-Real Transfer and Robustness Enhancements

Closing the sim-to-real gap is a persistent challenge:

Domain Randomization and Noise Injection: Teacher-student distillation with heavy exteroceptive noise (Gaussian, mask-out/dropouts) and privileged information in simulation reduces overfitting to simulation artifacts and builds robustness to real sensor perturbations (Mortensen et al., 2023).
Minimal and Fixed Sensor Abstractions: Approaches using a fixed, low-dimensional laser input (e.g., 10 beams) or temporal stackings generalize better to real hardware and permit direct transfer (Tai et al., 2017, Jin et al., 2019).
Self-supervised or contrastive representation learning: VFMs and distillation losses facilitate perceptual generalization (e.g., semantic and entity categories not seen during training), crucial for open-set outdoor navigation (Zhang et al., 5 Mar 2025).
Memory and Temporal Filtering: GRU/LSTM encoders within policies or critics filter noisy local observations and permit latent state estimation, enhancing performance under partial observability and non-deterministic environments (Mortensen et al., 2023, Grando et al., 2022, Grando et al., 2021).

Traversability Constraints: Multi-objective optimization (distance, collision penalty, traversability coverage, diversity) is employed to select or generate only those global paths that guarantee safety across diverse terrain (e.g., via GP-based local traversability analysis plus RRT* planning (Leininger et al., 2024), CVAE-based per-waypoint traversability filtering (Liang et al., 2023, Olguin et al., 1 Feb 2026), and adaptive traversability loss in diffusion models (Liang et al., 2024)).
Social Safety: Explicit rewards for social compliance (avoiding intrusion into human's predicted paths, “social zones”) are integrated with progress and ego-safety metrics to produce cooperative collision-avoidance in dynamic crowd navigation (Jin et al., 2019), with curriculum on non-stationary, human-driven agents (Fan et al., 2018).
Safe RL Methods: Constrained Policy Optimization and explicit risk critics enforce hard constraints on expected collision or violation cost, producing policies that balance progress with safety under probabilistic uncertainty (Gao et al., 15 Mar 2025).

5. Representative System Architectures and Empirical Evaluation

The table below summarizes salient architecture and deployment attributes of prominent mapless global navigation pipelines:

Approach	Sensory Modalities	Global Guidance	Core Architecture	Real-World Deployment	Safety/Constraints
Teacher-student RL (Mortensen et al., 2023)	Heightmap (lidar/depth), proprioceptive	Range, heading to goal	Two-stage PPO+GRU distillation	Yes (rover)	Sim-to-real noise distill
MTG (Liang et al., 2023)	Lidar, odometry	Goal position	CVAE w/ traversability & coverage loss	Yes (UGV, Boston Spot)	Coverage/diversity/traversability
Sem-NaVAE (Olguin et al., 1 Feb 2026)	Lidar, RGB, GPS-RTK/IMU	Global goal, open-vocab cost	CVAE+CLIPSeg VLM selection	Yes (outdoor UGV)	Semantic constraints
DTG (Liang et al., 2024)	Lidar, odometry	Goal position	Diffusion-CRNN generator	Yes (Husky)	Traversability loss
HRL (CPO) (Gao et al., 15 Mar 2025)	Lidar	Sub-goal proposal	DQN high-level + CPO RL low-level	Yes (TurtleBot3)	Explicit CPO safety
CREStE (Zhang et al., 5 Mar 2025)	RGB, lidar, depth	Subgoal (carrot)	VFM distillation + counterfactual IRL	Yes (urban UGV)	Open-set reward, IRL

Empirical results consistently demonstrate that modern mapless global navigation systems:

Achieve success rates exceeding 90% in diverse, real or simulated terrains.
Exhibit improved robustness (reduced oscillation, more stable control) in noisy, real-world deployment compared to vanilla simulation-trained policies.
Outperform classic map-based navigation pipelines in environments where localization or mapping is unreliable or impossible (Mortensen et al., 2023, Xue et al., 2022).

6. Open Challenges and Future Directions

Despite rapid methodological advances, several challenges persist in the field:

Long-horizon, cluttered navigation: Maintaining globally near-optimal solutions when only local context is available remains challenging in complex, large-scale layouts. Bi-level hierarchical methods and more expressive trajectory generators offer partial mitigation (Gao et al., 15 Mar 2025, Liang et al., 2024, Liang et al., 2023).
Dynamic and open-set perception: Fully generalizing navigation behaviors to unseen semantic classes, rare events, or dynamic hazards without retraining requires strong open-set representation learning and scalable cost inference (Zhang et al., 5 Mar 2025, Olguin et al., 1 Feb 2026).
Integration with semantic intent: Efficiently leveraging high-level, human-friendly specifications (e.g., “avoid grass unless necessary,” “prefer shaded paths,” open-vocabulary segmentation) to condition global path selection remains a frontier topic (Olguin et al., 1 Feb 2026).
Energy and computational efficiency: Hardware-realistic deployment (e.g., on neuromorphic hardware (Tang et al., 2020)) and adaptive compute scaling for field robotics are emerging themes.

Further research directions include joint learning of global waypoint discovery, on-the-fly cost acquisition via counterfactual or preference learning, and real-time multimodal trajectory generation—including diffusion-based models and efficient CVAE variants—integrated with robust, constraint-aware control for reliable long-horizon deployment.