Determine whether PLMs perform true spatial abstraction versus pattern matching

Determine whether pretrained language models (PLMs) internally form the abstractions required for spatial reasoning over text, or whether their decisions are primarily driven by surface-level patterns learned from training data.

Background

Spatial question answering over text requires inferring implicit spatial relations from explicitly stated ones. Prior work reports reasonable performance from pretrained LLMs (PLMs) on these tasks, yet their black-box nature makes it difficult to assess whether they truly perform multi-step spatial reasoning or rely on superficial cues.

This paper explores disentangling extraction and reasoning by combining neural extraction with symbolic reasoning and shows benefits in controlled and realistic settings. Nonetheless, a central open question remains whether PLMs themselves construct the necessary spatial abstractions or mainly exploit patterns seen during training.

References

Furthermore, the black-box nature of PLMs makes it unclear whether these models are making the abstractions necessary for spatial reasoning or their decisions are based solely on patterns observed in the data.

Disentangling Extraction and Reasoning in Multi-hop Spatial Reasoning  (2310.16731 - Mirzaee et al., 2023) in Section 1 Introduction