Modeling Dynamic Environments with Scene Graph Memory

Published 27 May 2023 in cs.LG and cs.AI | (2305.17537v4)

Abstract: Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep. This partial observability poses a challenge to existing link prediction approaches, which we address. We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations, as well as a neural net architecture called a Node Edge Predictor (NEP) that extracts information from the SGM to search efficiently. We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes, and show that NEP can be trained to predict the locations of objects in a variety of environments with diverse object movement dynamics, outperforming baselines both in terms of new scene adaptability and overall accuracy. The codebase and more can be found at https://www.scenegraphmemory.com.

Abstract PDF HTML Upgrade to Chat

Citations (5)

View on Semantic Scholar

Summary

The paper introduces Scene Graph Memory (SGM) and Node Edge Predictor (NEP) to handle partial observability in dynamic object search tasks.
The methodology employs transformer-based edge classification with GCN and HEAT operators for effective temporal link prediction.
Results demonstrate that the NEP HEAT variant significantly outperforms baseline models in accuracy and adaptability.

Modeling Dynamic Environments with Scene Graph Memory

Introduction

The paper presents a novel approach to address the challenge of object search by embodied AI agents in large dynamic environments such as households. The authors conceptualize this problem as temporal link prediction on dynamic, partially observable graphs, introducing the Scene Graph Memory (SGM) and Node Edge Predictor (NEP) models to facilitate efficient decision-making under partial observability. The paper evaluates the proposed methods using the Dynamic House Simulator, showing improved adaptability and predictive accuracy over existing methods.

Figure 1: Our problem setup and proposed method: an agent searches for objects in a dynamic household environment, using a Scene Graph Memory aggregated from partial observations (SGM) and a Node Edge Predictor model (NEP) to predict object locations.

Problem Formulation

Temporal link prediction traditionally involves estimating future edges in a dynamic graph based on complete past observations. This research introduces temporal link prediction under partial observability, a novel variant where the observed data is incomplete and evolving. The task involves predicting relationships between object nodes in dynamically changing scene graphs, a situation that is prevalent in embodied AI applications like object search and navigation.

Methodology

Scene Graph Memory (SGM)

SGM serves as a dynamic data structure, aggregating an agent's observations over time into a cohesive scene graph. This graph encompasses nodes representing objects, rooms, and their relationships (edges), accounting for both observed and hypothetical connections. The SGM is crucial for capturing both the temporal dynamics and semantic context needed for accurate link prediction.

Node Edge Predictor (NEP)

NEP is a neural architecture designed for inference on dynamic, partially observable scene graphs. It consists of modules for node and edge embeddings, a feature fusion process, and a transformer-based edge classification mechanism. The NEP architecture leverages GCN and HEAT operators to enhance the model's capability in predicting the likelihood of unobserved connections effectively.

Figure 2: Node Edge Predictor (NEP) model architecture illustrating GCN and HEAT variants.

Experimental Setup

The study employs the Dynamic House Simulator to benchmark the methods' performance across various tasks, simulating diverse household environments with dynamic object locations. Key tasks evaluated include Predict Object Location, Predict Relative Location Likelihood, and Find Object, each designed to assess the model's ability to adapt and predict in evolving environments.

Figure 3: The household object placement probability priors highlighting room-object-furniture relationships with varying likelihoods.

Results

The NEP, particularly the HEAT variant, significantly outperforms baseline models, including Random, Prior-based, and Bayesian methods. The introduction of SGM facilitates enhanced adaptability and learning over time, demonstrating superior prediction accuracy and reduced decision latency. The model's ability to leverage semantic and temporal features results in improved performance across all tasks.

Figure 4: The average accuracy and variance for the Predict Object Location task, demonstrating the NEP's learning efficiency over time.

Conclusion

This study successfully addresses a complex AI task by formulating an effective problem representation combining SGM and NEP models. The proposed solution shows potential for real-world applications in AI-driven object search by delivering superior adaptability and predictive mechanics. Future work could explore integration with reinforcement learning frameworks and application in more complex, realistic environments, enhancing the versatility and robustness of AI agents in dynamic settings.