Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration

Published 17 Apr 2025 in cs.CL and cs.AI | (2504.12773v1)

Abstract: Recent advances in Multimodal LLMs (MLLMs) have achieved remarkable progress in general domains and demonstrated promise in multimodal mathematical reasoning. However, applying MLLMs to geometry problem solving (GPS) remains challenging due to lack of accurate step-by-step solution data and severe hallucinations during reasoning. In this paper, we propose GeoGen, a pipeline that can automatically generates step-wise reasoning paths for geometry diagrams. By leveraging the precise symbolic reasoning, \textbf{GeoGen} produces large-scale, high-quality question-answer pairs. To further enhance the logical reasoning ability of MLLMs, we train \textbf{GeoLogic}, a LLM using synthetic data generated by GeoGen. Serving as a bridge between natural language and symbolic systems, GeoLogic enables symbolic tools to help verifying MLLM outputs, making the reasoning process more rigorous and alleviating hallucinations. Experimental results show that our approach consistently improves the performance of MLLMs, achieving remarkable results on benchmarks for geometric reasoning tasks. This improvement stems from our integration of the strengths of LLMs and symbolic systems, which enables a more reliable and interpretable approach for the GPS task. Codes are available at https://github.com/ycpNotFound/GeoGen.

Abstract PDF Upgrade to Chat

Summary

The paper introduces GeoGen, a pipeline that integrates symbolic reasoning with LLMs to generate accurate, step-wise geometric solutions.
It details the GeoLogic module that translates natural language outputs into formal logic, ensuring rigorous verification of reasoning steps.
Experimental results show enhanced accuracy on geometry benchmarks, with improved clarity and reduced hallucinations in complex problem-solving.

Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration

The paper "Enhancing the Geometric Problem-Solving Ability of Multimodal LLMs via Symbolic-Neural Integration" presents GeoGen, a pipeline designed to address the challenges encountered by Multimodal LLMs (MLLMs) in Geometry Problem Solving (GPS). By integrating symbolic reasoning systems and natural language processing, the authors aim to enhance the logical reasoning capabilities of these models, allowing them to solve complex geometric problems with improved accuracy and reliability.

Introduction to GeoGen Pipeline

The GeoGen pipeline is devised to generate step-wise reasoning paths automatically for geometry diagrams by leveraging symbolic systems for precise deductions and LLMs for translating these sequences into human-readable explanations.

Figure 1: Framework of our proposed GeoGen pipeline.

GeoGen primarily addresses the lack of high-quality natural language reasoning data in existing geometry datasets. It translates structured symbolic reasoning paths into coherent natural language explanations using external LLMs. The resultant data include both synthesized diagrams and expanded annotations for existing datasets like Geometry3K and PGPS9K, culminating in two new datasets: GeoExpand and GeoSynth.

Integrating Symbolic Systems with GeoLogic

GeoLogic functions as a bridge in this framework, translating complex geometric reasoning into formats compatible with symbolic systems, thus enabling rigorous verification of MLLM outputs.

Figure 2: We conduct ablation study to examine the impact of different training data compositions. This figure shows the performance on GeoTest across epochs for each configuration. T0-T3 settings are defined in Table.

GeoLogic was trained using pairs of formal and natural language representations, derived from systematically broken down reasoning units. During inference, GeoLogic translates the natural language outputs of MLLMs into formal logic, allowing the symbolic system to verify each step's validity. This verification ensures alignment with geometric principles, reducing hallucinations and errors in reasoning.

Experimental Results and Insights

The experimental evaluation reveals significant improvements in the performance of MLLMs on geometry benchmarks. The integration of symbolic systems especially reinforces step-by-step reasoning accuracy, albeit with some constraints on the breadth of logical correctness.

Figure 3: Accuracy trends as we vary the search width during symbolic reasoning in the inference stage. We adopt the same evaluation metrics as before.

Performance improvements are particularly pronounced when comparing models trained solely on traditional datasets against those incorporating GeoGen-synthesized data. Additionally, symbolic verification during inference enhances alignment with diagram information, offering more reliable predictions.

Case Study: Enhanced Reasoning with Symbolic Integration

A detailed case study demonstrates the progressive improvements in model predictions when integrating the GeoGen pipeline and GeoLogic. The model's ability to produce coherent reasoning steps and correct answers in complex geometry problems significantly improves, showcasing the effective mitigation of hallucination and logical inconsistencies.

Figure 4: A typical case with model predictions improving as our methods are progressively integrated.

Conclusion

This study provides crucial advancements in MLLMs' handling of geometric reasoning tasks. With the GeoGen pipeline and GeoLogic model, not only is reasoning accuracy improved, but the clarity, consistency, and interpretability of this reasoning are substantially enhanced. Future work is intended to further refine the integration techniques and explore broader applications in AI reasoning scenarios.