FocusNav: Enhanced Humanoid Robot Navigation

Updated 27 January 2026

FocusNav is a spatial selective attention framework that integrates waypoint-guided cross-attention and Stability-Aware Selective Gating to optimize collision avoidance and gait stability.
The system employs a real-time stability metric computed from proprioceptive feedback to dynamically compress distal spatial cues when needed.
Empirical results on diverse terrains demonstrate that FocusNav significantly improves navigation success and gait stability compared to traditional attention methods.

FocusNav is a spatial selective attention framework for humanoid robot local navigation, engineered to dynamically prioritize perceptual focus based on navigation intent and real-time stability cues. The system integrates waypoint-guided cross-attention with Stability-Aware Selective Gating (SASG), equipping humanoid robots to successfully traverse complex, dynamic environments by modulating the field of view and compressing perceptual processing contingent on actual locomotion stability. The framework is designed to optimize collision avoidance and dynamic gait stability, outperforming traditional attention and memory paradigms on empirical benchmarks (Zhang et al., 19 Jan 2026).

1. Core Architecture and Selective Attention Pipeline

FocusNav’s architecture combines dual layers of attention. The first layer, Waypoint-Guided Spatial Cross-Attention (WGSCA), forms trajectory-aligned map embeddings at discrete waypoints along a predicted collision-free path. The second layer, SASG, adaptively compresses or truncates distal spatial information in the embedding sequence based on a real-time stability metric computed from proprioceptive feedback.

Let $N$ be the number of predicted waypoints and $m_k \in \mathbb{R}^d$ the embedding at waypoint $k$ . Given robot proprioceptive state $S_p$ , roll–pitch angles $\phi_{xy}$ , and angular velocities $\omega_{xy}$ , SASG evaluates current stability and determines whether to pass only the first, proximal embedding $m_1$ or to admit the sum of all waypoints along the planned path.

This selective truncation mechanism compels the navigation policy to prioritize foothold safety in low-stability scenarios by focusing exclusively on immediate terrain features.

2. Mathematical Formulation and SASG Mechanism

SASG operates through a sequence of mathematically defined steps:

Stability Metric:

$\mathcal{S}_m = \frac{1}{1 + k_1 \|\phi_{xy}\|^2 + k_2 \|\omega_{xy}\|^2} \in [0,1]$

with $k_1, k_2 > 0$ ; higher $\mathcal{S}_m$ indicates greater stability.

Gating Decision:

The binary gate $g$ is sampled via Gumbel-Softmax from logits $V^g = \mathrm{MLP}(S_p) \in \mathbb{R}^2$ , with temperature parameter $\tau$ . The gate opens ( $g=1$ ) when stability is high (allowing all $m_k$ ), and closes ( $g=0$ ) when stability is low (allowing only $m_1$ ).

Hybrid Map Embedding:

$m^h = m_1 + g \sum_{k=2}^N m_k$

Gate Loss:

Supervision is imposed through a cross-entropy loss encouraging $P(g=1)$ to match the stability metric:

$\mathcal{L}_g = -\mathbb{E}\left[ \mathcal{S}_m \log p_1 + (1-\mathcal{S}_m) \log p_0 \right]$

where $p_1$ and $p_0$ are the Gumbel-Softmax probabilities.

This gating strategy is explicitly supervised to align sensory field compression with actual robot state instability, providing principled integration of physical feedback into visual abstraction.

3. Integration with Policy Network and Control Loop

The FocusNav pipeline proceeds as follows:

WGSCA generates per-waypoint context embeddings by cross-attention between candidate path waypoints and egocentric BEV map patches.
SASG processes $\{m_k\}$ along with $S_p$ , returning the compressed embedding $m^h$ and the gate loss $\mathcal{L}_g$ .
$m^h$ concatenated with $S_p$ is input to a GRU-based policy, which maintains historical context. Even when non-proximal cues are gated out, high-level temporal dependencies are preserved by the GRU hidden state.
The policy is trained by behavior cloning with respect to an oracle trajectory, using a joint loss comprising traversability, path-following, and gating terms.

Key hyper-parameters include:

Gumbel-Softmax temperature $\tau$ (initialized at 1.0, annealed down),
Stability metric scaling factors ( $k_1=10.0, k_2=0.1$ ),
MLP with two 64-unit ReLU layers for $V^g$ ,
Gate loss weight $\lambda_3=0.5$ .

4. Empirical Performance and Ablation Studies

FocusNav achieves substantial gains in both navigation success and real-time stability versus baselines lacking adaptive gating. Key findings on the Unitree G1 platform include:

Scenario	WGSCA-Only Success (%)	FocusNav (SASG) Success (%)	$\mathrm{E}_{\textrm{stability}}$ (WGSCA-Only → SASG)
Static / Unstructured	$82.23 \pm 2.01$	$91.15 \pm 0.88$ (+8.9 pp)	$0.73 \pm 0.05 \rightarrow 0.81 \pm 0.03$
Dynamic / Unstructured	$74.15 \pm 3.15$	$87.02 \pm 1.15$ (+12.9 pp)	$0.68 \pm 0.05 \rightarrow 0.76 \pm 0.04$

Regions with higher frequency of gate closure (proximal-only focus) correspond to marked local gains in gait stability—typically a 10–12% improvement in high-difficulty areas (e.g., stairs). These results substantiate the claim that selective truncation of distal cues by SASG is tightly linked to enhanced locomotor safety (Zhang et al., 19 Jan 2026).

5. Theoretical Context: SASG and Stability in Selective SSMs

FocusNav’s SASG module draws from the formal principles established in the broader literature on SASG in selective state-space models. The discrete gating in FocusNav’s implementation is related to the more general setting where gating signals may be discontinuous and input-dependent. Theoretical analysis guarantees that, under suitable contraction or dissipativity assumptions (e.g., uniform quadratic Lyapunov storage, regularity of system matrices, satisfaction of parametric LMIs), exponential forgetting of state and robust input-to-state stability (ISS) are achieved. These results formalize conditions under which the system remains stable, resists catastrophic forgetting, and ensures that the gating does not introduce instabilities—even in the presence of hard gating or mode switches (Bhat, 2024, Zubić et al., 16 May 2025).

Unlike gates in LSTMs or GRUs—which are primarily designed to mitigate vanishing gradients and do not explicitly optimize the trade-off between information retention and computational overhead—FocusNav’s SASG module is supervised by a physical stability metric and justified using information-theoretic and control-theoretic arguments. This enables direct compression of context according to predicted navigational safety.

A key distinction is that while LSTM/GRU gates update all units each step (even if partially suppressed), SASG dynamically deactivates unnecessary spatial features based on actual mobility risk detected in real-time. This reduces per-step computation and memory overhead.
The explicit association of the gating policy with a physical metric (robot stability) grounds the selective mechanism in the quantitative performance of the robot, rather than relying solely on end-to-end reward signals.

7. Significance and Prospects

FocusNav establishes a principled approach to local robot navigation in diverse and difficult terrains, providing a template for attention mechanisms in embodied AI that are sensitive to the interaction between task demands and system stability. Its success in empirical settings, theoretically sound underpinnings in selective SSMs with discontinuous gating, and capacity to integrate multi-modal sensory feedback illustrate the potential of SASG-style strategies for advanced autonomous control (Zhang et al., 19 Jan 2026, Zubić et al., 16 May 2025, Bhat, 2024).