Generalization of memory designs across robotic manipulation tasks

Determine which memory designs for robotic manipulation policies generalize across tasks, identifying which approaches achieve robust performance across diverse long-horizon, history-dependent manipulation scenarios.

Background

Prior memory-based manipulation methods employ varied policy backbones and heterogeneous evaluation protocols, making comparisons difficult and obscuring general trends. Existing benchmarks have also lacked sufficient diversity and difficulty to stress different kinds of memory, further complicating systematic assessment.

RoboMME is introduced to address this gap by providing a unified benchmark spanning temporal, spatial, object, and procedural memory. The open question concerns which specific memory designs (e.g., representations and integration strategies) actually generalize across tasks when evaluated in a standardized setting.

References

While demonstrating the importance of memory, these methods rely on different policy backbones and inconsistent evaluation protocols, making it unclear which memory designs generalize across tasks.

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies  (2603.04639 - Dai et al., 4 Mar 2026) in Introduction