Papers
Topics
Authors
Recent
Search
2000 character limit reached

Relative Energy Learning in 3D LiDAR OOD

Updated 17 November 2025
  • REL is a framework for 3D LiDAR OOD detection that uses energy-based modeling with a shift-invariant scoring mechanism to differentiate between inliers and anomalous points.
  • It integrates closed-set segmentation with a binary logistic loss on the relative energy gap, enhancing robustness in safety-critical autonomous driving systems.
  • The approach incorporates a geometry-aware synthetic OOD generation method (Point Raise) to augment training data and improve detection metrics on benchmarks.

Relative Energy Learning (REL) is a framework for out-of-distribution (OOD) detection in 3D LiDAR point clouds, particularly designed for safety-critical autonomous driving environments where reliable identification of rare or anomalous objects is essential. Unlike prior approaches that adapt 2D image OOD methods to 3D data without success, REL introduces a shift-invariant energy scoring mechanism and a tailored synthetic OOD data strategy, yielding robust discrimination between inlier and anomalous points.

1. Energy-Based Modeling for LiDAR OOD Detection

REL builds on energy-based models, which assess sample “confidence” by computing energy scores from neural logits. Formally, given a segmentation network with logits f(x)=[f1(x),,fC(x)]Tf(x) = [f_1(x),\ldots,f_C(x)]^T for CC in-distribution classes:

E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]

where TT is a temperature hyperparameter. OOD samples typically yield higher energies than inliers. Prior methods employ hinge losses with preset margins and temperature scaling, but calibration becomes tenuous under severe class imbalance and varying logit scales common in LiDAR tasks.

2. Relative Energy: Motivation, Definition, and Invariance

Relative Energy ameliorates the shift- and scale-sensitivity of raw energy scores by comparing the summed exponentials of positive (“inlier”) logits and negative (learned “OOD”) logits. The network is reparameterized to output $2K$ logits per point, with the first KK for in-distribution and the second KK for negative/OOD:

  • Positive free energy: Epos(x)=logiy+efi(x)E_{pos}(x) = -\log \sum_{i \in y^+} e^{f_i(x)}
  • Negative free energy: Eneg(x)=logiyefi(x)E_{neg}(x) = -\log \sum_{i \in y^-} e^{f_i(x)}
  • Relative energy gap:

ΔE(x)=Eneg(x)Epos(x)=log[iyefi(x)iy+efi(x)]\Delta E(x) = E_{neg}(x) - E_{pos}(x) = \log \left[ \frac{\sum_{i \in y^-} e^{f_i(x)}}{\sum_{i \in y^+} e^{f_i(x)}} \right]

This ratio is shift-invariant, reducing the need for per-scene or per-backbone calibration, and reliably separates the energy distributions of inliers and OOD points even when classes are imbalanced (CC0 for inliers; CC1 for OOD).

3. Integrated Training Objective

REL’s training combines closed-set segmentation, via the Mask4Former objective, and a binary logistic loss on the relative energy gap. The main terms are:

  • Mask/class loss (CC2): cross-entropy and Hungarian-matched mask overlap as in Mask4Former.
  • REL OOD loss (CC3):

CC4

Where CC5 is the set of inliers, CC6 is the set of synthetic OOD points from Point Raise, CC7 is an imbalance weight (100 typically), and CC8 is the sigmoid function. The combined objective is:

CC9

with E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]0 by default. This encourages the relative energy gap to separate inlier and OOD populations.

4. Point Raise: Geometry-Aware Synthetic OOD Generation

The absence of annotated OOD points in training data is addressed by Point Raise, a lightweight geometry-aware synthesis algorithm. Its steps are:

  • Select random road points from the point cloud E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]1, given labels E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]2.
  • Sample a cluster within radius E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]3 using KDTree(E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]4). Compute spatial distances E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]5 and extract E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]6/E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]7.
  • Apply an inward pull: adaptive decay E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]8, then scale each point’s E(x)=Tlog[i=1Cefi(x)/T]E(x) = -T \cdot \log \left[ \sum_{i=1}^C e^{f_i(x)/T} \right]9 coordinates by TT0 with TT1.
  • Assign random heights to cluster points: TT2, updating TT3.
  • Relabel affected points as “RAISED_CLASS” (auxiliary OOD class).

Recommended hyperparameters are TT4m, TT5m, TT6. This produces compact, object-like OOD clusters that do not overlap inlier semantics.

5. Network Architecture and REL Integration

The backbone is Mask4Former-3D: a Minkowski Sparse-UNet encoder, transformer decoder using FPS-sampled queries, producing panoptic masks for K inlier classes. REL’s OOD scoring is realized as:

  • Auxiliary projector branch appended to every point’s encoder features
  • 3 TT7 (Linear TT8 ReLU) layers yielding TT9 logits
    • First $2K$0: positive logits $2K$1 for in-distribution classes
    • Next $2K$2: negative logits for OOD
  • Relative energy gap $2K$3 computed per point and used for OOD decision

During inference, segmentation proceeds using Mask4Former; REL assigns an OOD score via $2K$4 for each point.

6. Empirical Performance and Ablation Results

REL has been evaluated on STU (Spotting the Unexpected) and SemanticKITTI benchmarks using both point- and object-level OOD metrics. Benchmarks and baselines include Deep Ensemble, MC Dropout, Max Logit (MSP), Void Classifier, RbA, and UEM. Results include:

Dataset / Metric AUROC (↑) FPR@95% (↓) AP (↑)
STU val (REL) 97.85 9.60 10.68
STU val (UEM) 95.80 26.37 6.78
STU test (REL) 96.26 21.69 10.17
STU test (Void) 85.99 78.60 3.92
KITTI outlier (REL) 96.76 18.66 67.32
KITTI outlier (UEM) 93.15 37.07 61.73

Ablations demonstrate that REL’s relative energy yields the highest AUROC and lowest FPR@95 compared to hinge, VOS, and dual energy losses. Backbone finetuning offers further gains (AUROC: frozen $2K$5, finetuned $2K$6), while Point Raise is vital ($2K$7 best; no synthesis $2K$8 AUROC, $2K$9 FPR@95).

7. Thresholding and Deployment in Safety-Critical Systems

For operational use, REL applies a simple decision rule: classify point KK0 as OOD if KK1. A zero-centered threshold (KK2) naturally splits inliers from OOD. For controlled false-positive rates (e.g., FPR@95%), KK3 can be empirically selected on validation data; this calibration robustly generalizes across scenes and backbones without temperature/margin adjustment. In deployment, maintaining a running histogram of inlier KK4 enables dynamic threshold adaptation to keep FPR within safety limits.

The shift invariance and universal thresholding of REL streamline the integration of OOD detection into safety-critical autonomous driving stacks, mitigating overconfident errors and enabling robust behavior under open-world uncertainty.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relative Energy Learning (REL).