Papers
Topics
Authors
Recent
Search
2000 character limit reached

Can Mental Imagery Improve the Thinking Capabilities of AI Systems?

Published 16 Jul 2025 in cs.LG and cs.AI | (2507.12555v2)

Abstract: Although existing models can interact with humans and provide satisfactory responses, they lack the ability to act autonomously or engage in independent reasoning. Furthermore, input data in these models is typically provided as explicit queries, even when some sensory data is already acquired. In addition, AI agents, which are computational entities designed to perform tasks and make decisions autonomously based on their programming, data inputs, and learned knowledge, have shown significant progress. However, they struggle with integrating knowledge across multiple domains, unlike humans. Mental imagery plays a fundamental role in the brain's thinking process, which involves performing tasks based on internal multisensory data, planned actions, needs, and reasoning capabilities. In this paper, we investigate how to integrate mental imagery into a machine thinking framework and how this could be beneficial in initiating the thinking process. Our proposed machine thinking framework integrates a Cognitive thinking unit supported by three auxiliary units: the Input Data Unit, the Needs Unit, and the Mental Imagery Unit. Within this framework, data is represented as natural language sentences or drawn sketches, serving both informative and decision-making purposes. We conducted validation tests for this framework, and the results are presented and discussed.

Authors (1)

Summary

  • The paper introduces a machine thinking framework that integrates mental imagery to simulate human cognitive processes for improved autonomous reasoning.
  • It employs a modular design with Needs, Input Data, and Mental Imagery Units to transform sensory data into actionable, context-driven scenarios.
  • Experimental evaluations, including image captioning and object detection, validate its potential to bridge AI functions with human-like simulation.

Can Mental Imagery Improve the Thinking Capabilities of AI Systems?

This paper investigates the integration of mental imagery into AI systems to enhance autonomous reasoning capabilities. It proposes a "machine thinking" framework that mimics human cognitive processes by simulating mental images, thus enabling AI systems to perform tasks based on internal data, planned actions, needs, and reasoning processes.

Framework Overview

The proposed framework consists of a Cognitive Thinking Unit (CTU) supported by three auxiliary units: the Needs Unit, the Input Data Unit, and the Mental Imagery Unit. Each unit plays a distinct role:

  • Needs Unit: Maintains internal motivations and scheduled actions based on reasoning outcomes.
  • Input Data Unit: Processes sensory data into natural language descriptions using advanced CV and NLP models.
  • Mental Imagery Unit: Generates and refines imagined images for reasoning simulations.
  • Cognitive Thinking Unit (CTU): Synthesizes information from all units to simulate reasoning, plan actions, and generate contextual decisions. Figure 1

Figure 1

Figure 1: (Left)Diagram of Machine Thinking algorithms. (Right) Output of Human Thinking.

Role of Mental Imagery

Mental imagery, a critical aspect of human cognition, plays a pivotal role in the framework by enabling the simulation and prediction of scenarios within the CTU. This capacity for generating mental images allows the AI to engage in proactive reasoning and decision-making, moving beyond reactive responses typical of existing AI systems.

Human cognitive studies have shown that imagination and perception share overlapping neural substrates. Inspired by these insights, the framework leverages mental imagery to perform tasks such as generating hypotheses, simulating future events, and updating knowledge representations. Figure 2

Figure 2: Reconstruction of imagined images from brain activity.

Experimental Evaluation

The paper presents experiments across multiple domains to validate the framework:

Image Captioning and Object Detection

Using a combination of Faster R-CNN for object detection and BLIP for image captioning, the system translates visual data into descriptive sentences. This conversion facilitates the needs-context match process within the CTU.

Needs-Context Alignment

The CTU matches internal needs with contextual sentences using semantic similarity models like SentenceTransformer. The framework generates appropriate actions by interpreting the context and pursuing goal-oriented tasks, demonstrating reasoning capabilities through natural language summaries.

Mental Image Generation

Mental imagery enables dynamic scenario simulations. For example, when presented with an event description, the mental imagery unit generates, visualizes, and examines a sequence of mental images, assisting the CTU in refining conclusions or formulating new actions. Figure 3

Figure 3: Results from strong imagination. The subject is given an instruction on the fMRI screen, such as "Imagine a portrait representing optimism". Then, the associated brain data is fed to the fMRI-to-Image model to produce a reconstruction.

Conclusion

The integration of mental imagery within machine thinking frameworks enriches autonomous reasoning by extending beyond current AI limitations. This synergistic approach leverages internal simulations for contextual decision-making, thereby offering a pathway to emulate human-like cognitive processes artificially. Future directions include optimizing the integration of mental imagery for more nuanced tasks and further bridging AI cognition with human intelligence.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.