Single-agent long-horizon problem solving

Develop a single large vision-language model agent that exhibits stronger long-horizon problem-solving capabilities, enabling sustained multi-step reasoning and decision-making over extended interactions without relying on auxiliary mechanisms such as parallel test-time scaling.

Background

The paper introduces Thinking with Map, a map-augmented agentic framework for image geolocalization that iteratively proposes and verifies location hypotheses using map tools. To compensate for limitations in sustained reasoning within a single agent, the authors adopt parallel test-time scaling with a verifier to aggregate multiple reasoning trajectories.

Despite improvements from reinforcement learning and parallel sampling, the authors note persistent gaps between the model’s map-use capabilities and human performance, and identify the need for a single agent capable of robust long-horizon reasoning. They explicitly state that building such a single agent remains an open problem.

References

How to build a single agent with stronger long-horizon problem-solving capabilities remains an open problem.

Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization  (2601.05432 - Ji et al., 8 Jan 2026) in Limitation section