Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Published 28 Apr 2025 in cs.AI | (2505.01441v1)

Abstract: LLMs have achieved remarkable progress in complex reasoning tasks, yet they remain fundamentally limited by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often demands dynamic, multi-step reasoning, adaptive decision making, and the ability to interact with external tools and environments. In this work, we introduce ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a unified framework that tightly couples agentic reasoning, reinforcement learning, and tool integration for LLMs. ARTIST enables models to autonomously decide when, how, and which tools to invoke within multi-turn reasoning chains, leveraging outcome-based RL to learn robust strategies for tool use and environment interaction without requiring step-level supervision. Extensive experiments on mathematical reasoning and multi-turn function calling benchmarks show that ARTIST consistently outperforms state-of-the-art baselines, with up to 22% absolute improvement over base models and strong gains on the most challenging tasks. Detailed studies and metric analyses reveal that agentic RL training leads to deeper reasoning, more effective tool use, and higher-quality solutions. Our results establish agentic RL with tool integration as a powerful new frontier for robust, interpretable, and generalizable problem-solving in LLMs.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates ARTI, a framework that integrates agentic reasoning, reinforcement learning, and external tool use to enable dynamic decision-making in LLMs.
It employs the GRPO algorithm with outcome-based reward learning, achieving a 22% performance boost on AMC math tasks compared to baseline models.
The framework enhances multi-turn function calling with adaptive tool selection and iterative self-correction, improving handling of long-context tasks.

Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

Agentic Reasoning and Tool Integration (ARTI) is a framework aimed at enhancing the capabilities of LLMs by integrating dynamic agentic reasoning with reinforcement learning (RL) and external tool use. This approach addresses the limitations of LLMs which rely solely on static internal knowledge and text-based reasoning, especially in complex real-world problem-solving scenarios.

Methodology

Framework and Architecture

ARTI unifies agentic reasoning, RL, and tool integration, allowing LLMs to autonomously decide when, how, and which tools to invoke within reasoning chains. The framework employs outcome-based reward learning to develop strategies for tool use and environment interaction without needing step-level supervision.

Figure 1: The architecture. Agentic reasoning is achieved by interleaving text-based thinking, tool queries, and tool outputs, enabling dynamic coordination of reasoning, tool use, and environment interaction within a unified framework.

Agentic Reasoning Process

The process involves the interleaving of internal text reasoning, tool invocation, and tool-output integration in reasoning rollouts. This structure allows the model to coordinate effectively between its internal thought process and external tool use (Figure 2).

Figure 2: Overview of the methodology. The framework illustrates how reasoning rollouts alternate between internal thinking, tool use, and environment interaction, with outcome-based rewards guiding learning. This enables the model to iteratively refine its reasoning and tool-use strategies through reinforcement learning.

Reinforcement Learning Algorithm

To train agentic LLMs with tool integration, ARTI utilizes a Group Relative Policy Optimization (GRPO) approach. GRPO leverages outcome-based rewards without step-level feedback, allowing models to optimize tool interactions efficiently. It employs a loss masking strategy to focus the learning process on model-generated reasoning rather than deterministic tool outputs.

Experimental Evaluation

Complex Mathematical Reasoning

ARTI achieves substantial improvements in complex mathematical reasoning tasks, particularly on benchmarks like AMC, AIME, and MATH-500.

Performance Gains: Qwen2.5-14B-Instruct + ARTI outperformed baselines, achieving a 22% improvement on AMC tasks over base models.

Figure 3: Qwen2.5-7B-Instruct: Performance on Math datasets.

Figure 4: Average reward score, Tool call and the response length metric across all math datasets.

Multi-Turn Function Calling

On multi-turn function calling tasks, such as those in the tau-bench and BFCL v3 datasets, ARTI demonstrated significant accuracy improvements, highlighting its agentic capabilities for orchestrating tool use in dynamic, multi-turn environments.

Adaptive Strategies: The framework supported emergent behaviors like adaptive tool selection and iterative self-correction, shown in its improved handling of long-context tasks.
Figure 5: Qwen2.5-7B-Instruct: Performance on tau-bench and BFCL v3 datasets for Multi-turn Function calling.

Figure 6: Average reward score at different training steps for BFCL v3.

Implications and Future Work

The integration of RL with agentic reasoning marks a shift in LLM capabilities, facilitating robust problem-solving in diverse and complex environments. ARTI not only enhances performance metrics but also improves model interpretability and adaptability. Future developments could explore more diverse tool integrations and broader applicability across various domains, refining ARTI to support even more complex tasks and dynamic real-world challenges.

Figure 7: Metrics Analysis for Multi-Turn Function Calling on tau-bench.

Conclusion

Agentic Reasoning and Tool Integration via Reinforcement Learning offers a promising advancement in LLMs, bridging the gap between static inference and dynamic, tool-augmented reasoning. This framework’s ability to strategically integrate external tools enhances problem-solving effectiveness, adaptability, and interpretability, setting a new standard for real-world AI applications.

Markdown Report Issue