FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

Published 17 Feb 2025 in cs.AI, cs.CE, and q-fin.TR | (2502.11433v3)

Abstract: LLMs fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{FLAG-Trader}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a fusion approach combining LLMs with gradient-based reinforcement learning to enhance financial trading, showing superior performance on stocks and cryptocurrencies.
The method employs a dual network design, where a partially fine-tuned policy network directs trading actions and a value network evaluates returns.
The experiments demonstrate efficient adaptation and stability, with improved cumulative returns and Sharpe ratios compared to conventional trading strategies.

FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading

Introduction

"FLAG-Trader" (2502.11433) introduces a fusion approach that integrates LLMs with reinforcement learning (RL) for advancing financial trading systems. Traditional RL models encounter challenges such as inefficiencies in handling multimodal data, non-stationary market conditions, and reliance on manual feature engineering. FLAG-Trader addresses these issues by leveraging the LLMs' robust language processing capabilities alongside RL optimization to enhance decision-making in trading.

Figure 1: A high-level overview of the LLM-based reinforcement learning setup for financial trading, illustrating the state-action-reward architecture.

FLAG-Trader Architecture

The architecture employs a partially fine-tuned LLM as the policy network, which allows the model to utilize pre-trained knowledge effectively while adapting to financial contexts through gradient-based policy optimization. Only select parameters of the LLM are updated to retain general reasoning capabilities while ensuring computational efficiency.

Prompt Input Design: Textual state representation is crafted into prompts incorporating market conditions and action frameworks, ensuring the LLM processes and outputs effective trading actions.
Model Design: The FLAG-Trader uses a dual network structure:
- Policy Network: Outputs the trading actions based on state input processed through LLM layers.
- Value Network: Evaluates the expected return, guiding the policy optimization process.
  Figure 2: The FLAG-Trader pipeline employing LLM-based actor-critic architecture with efficiently frozen and trainable layers.

Reinforcement Learning with LLMs

FLAG-Trader applies policy gradient methods to align LLM-based decision-making with trading performance metrics by integrating reward signals into the framework. The system uses Proximal Policy Optimization (PPO) to ensure stable learning and reduced divergence from prior policies.

Online Policy Gradient Learning

Advantages Estimation: Evaluates policy performance and determines the necessary adjustments through mechanisms like Generalized Advantage Estimation (GAE).
Loss Function and Parameter Updates: Employs meticulous loss functions for policy and value networks to facilitate robust training, using stochastic gradient descent for parameter updates.
Figure 3: The format of input prompt detailing task description, action space, current state, and output action.

Experiments and Results

FLAG-Trader is tested across multiple financial trading tasks like stocks and cryptocurrencies, where it consistently surpasses conventional trading strategies and LLM-agentic baselines. The experiment metrics such as cumulative return and Sharpe ratio underline enhanced performance, even allowing a small-scale LLM to outperform larger proprietary models.

Key advantages demonstrated include:

Superior Performance: Consistently surpassing traditional strategies in various metrics.
Efficient Adaptation: Integrating LLMs with RL fine-tuning allows smaller models to outperform larger counterparts by optimizing decision-making and resource use.
Stable Policies: The architecture promotes convergence to efficient trading strategies, less reliant on initial prompt biases.

Conclusion

FLAG-Trader offers significant potential for enhancing financial trading systems by integrating LLMs with RL. The structured reinforcement learning approach effectively exploits LLMs for nuanced, adaptive decision-making in volatile markets. Future research directions include optimizing computational efficiency, addressing market non-stationarity, and integrating risk-sensitive constraints to further improve real-world trading applications.

This fusion of LLMs and RL heralds a sophisticated, scalable avenue for achieving robust financial trading systems that align with dynamic market conditions.

Markdown Report Issue