DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

Published 22 Apr 2025 in cs.AI | (2504.15716v1)

Abstract: Effective reasoning remains a core challenge for LLMs in the financial domain, where tasks often require domain-specific knowledge, precise numerical calculations, and strict adherence to compliance rules. We propose DianJin-R1, a reasoning-enhanced framework designed to address these challenges through reasoning-augmented supervision and reinforcement learning. Central to our approach is DianJin-R1-Data, a high-quality dataset constructed from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance Check, CCC), combining diverse financial reasoning scenarios with verified annotations. Our models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned from Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct using a structured format that generates both reasoning steps and final answers. To further refine reasoning quality, we apply Group Relative Policy Optimization (GRPO), a reinforcement learning method that incorporates dual reward signals: one encouraging structured outputs and another rewarding answer correctness. We evaluate our models on five benchmarks: three financial datasets (CFLUE, FinQA, and CCC) and two general reasoning benchmarks (MATH-500 and GPQA-Diamond). Experimental results show that DianJin-R1 models consistently outperform their non-reasoning counterparts, especially on complex financial tasks. Moreover, on the real-world CCC dataset, our single-call reasoning models match or even surpass the performance of multi-agent systems that require significantly more computational cost. These findings demonstrate the effectiveness of DianJin-R1 in enhancing financial reasoning through structured supervision and reward-aligned learning, offering a scalable and practical solution for real-world applications.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates how structured supervision combined with reinforcement learning enhances financial reasoning in large language models.
The paper introduces DianJin-R1-Data, a high-quality dataset from CFLUE, FinQA, and CCC, which significantly improves model performance on benchmark tasks.
The paper details the use of a dual reward signaling method via Group Relative Policy Optimization to generate structured, accurate outputs in financial tasks.

DianJin-R1: Evaluating and Enhancing Financial Reasoning in LLMs

Introduction

The paper "DianJin-R1: Evaluating and Enhancing Financial Reasoning in LLMs" proposes a sophisticated framework aimed at improving financial reasoning in LLMs. As reasoning remains a challenging task in the financial domain, the research addresses the need for domain-specific knowledge and accurate numerical reasoning through DianJin-R1, which combines reasoning-augmented supervision and reinforcement learning.

Dataset Construction and Model Training

The research introduces DianJin-R1-Data, a high-quality dataset crafted from CFLUE, FinQA, and a proprietary compliance corpus named CCC. This dataset is instrumental in training the DianJin-R1 models, which are designed to generate structured reasoning steps and final answers. The models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned using a structured format to enhance financial reasoning capabilities.

To further refine these capabilities, the study employs Group Relative Policy Optimization (GRPO), a reinforcement learning method that employs dual reward signals. This method ensures the generation of structured outputs and the accuracy of answers, leveraging both supervised fine-tuning and reinforcement learning to optimize model performance. The training architecture is outlined in Figure 1.

Figure 1: Illustration of two-step training for DianJin-R1.

Evaluation and Results

DianJin-R1 models are evaluated across five benchmarks: CFLUE, FinQA, CCC, MATH-500, and GPQA-Diamond. The results indicate that DianJin-R1 models consistently outperform their non-reasoning counterparts, with significant improvements noted on complex financial tasks. Notably, the models excelled on the CCC dataset, surpassing even multi-agent systems typically associated with higher computational costs.

The paper provides an insightful analysis of the reinforcement learning benefits within the task domain, specifically the alignment of reward signals with desired outputs.

Practical Implications

One of the practical implementations demonstrated involves a multi-agent compliance system. By employing a multi-agent LLM-based system for condition-based compliance checks, the approach illustrates practical applicability by structuring workflow for complex compliance evaluations (Figure 2).

Figure 2: An example of reasoning data synthesized by a multi-agent system.

DianJin-R1's ability to integrate structured reasoning paths with compliance checks provides practical solutions in real-world financial contexts, suggesting scalable deployment for financial reasoning tasks.

Conclusion

The research introduces DianJin-R1, a comprehensive framework for enhancing financial reasoning within LLMs. By leveraging a combination of structured supervision, advanced dataset construction, and reinforcement learning, the framework aligns reasoning processes with financial domain requirements. Future directions may explore further refinement through alternative reinforcement learning strategies and the integration of external tools to bolster model robustness.

The outcomes underscore the significance of reasoning-augmented models, offering improvements in both accuracy and interpretability, and highlight their potential impact on practical applications in the financial sector.

Markdown