AutoML-Enabled Agentic Trading

Updated 24 February 2026

AutoML-enabled agentic trading systems are AI-driven platforms that autonomously perform model discovery, code refinement, and dynamic execution in financial markets.
They decompose complex trading workflows into modular stages, including model pre-selection, iterative code refinement, and fine-tuning of risk and trend metrics.
Empirical benchmarks show these systems achieve lower RMSE and higher Sharpe ratios while offering improved interpretability, traceability, and robust auditing.

AutoML-enabled agentic trading systems are AI-driven, fully or semi-autonomous financial market agents that integrate Automated Machine Learning (AutoML) pipelines with agentic orchestration—often through LLMs with dynamic tool usage, structured workflow planning, and iterative feedback. These systems automate the end-to-end process of model discovery, implementation, evaluation, and execution to deliver high-performing, interpretable, and auditable trading workflows, often outperforming conventional static AutoML or non-agentic baselines (Ang et al., 19 Aug 2025, Emmanoulopoulos et al., 11 Jul 2025, Chen et al., 17 Jan 2026).

1. Architectural Foundations and Workflow Decomposition

AutoML-enabled agentic trading systems are constructed as modular, multi-stage pipelines. A representative implementation is TS-Agent, which formalizes financial time-series modeling as a closed-loop pipeline orchestrated by a planner LLM agent (Ang et al., 19 Aug 2025). The architecture decomposes into three principal stages:

Model Pre-selection: Given task $\mathcal{T}$ , dataset $\mathcal{D}$ , and curated libraries (Cases $\mathcal{C}$ , Model Bank, Evaluation Bank), the agent retrieves the top- $k$ model candidates using case-based reasoning (CBR), leveraging LLM-generated embeddings for similarity scoring:

$\mathrm{Sim}(\mathcal{T}, T_i) = \cos(\phi(\mathcal{T}), \phi(T_i))$

Model candidates are ranked:

$\mathrm{Score}(M_i|\mathcal{T}) = \alpha\,\mathrm{Sim}(\mathcal{T},T_i) + \beta\,\hat{P}(M_i\,|\,\mathcal{T})$

Code Refinement: The planner LLM iteratively edits the model implementation script. The action space $\mathcal{A}$ includes model class selection, refinement heuristics, hyperparameter proposal, script execution, and error correction. Feedback is “lightpurple-loop” driven: code is patched, tuned, executed, and outcomes logged for further iteration.
Fine-Tuning & Prompt Optimization: Once the top pipeline is identified, traditional gradient-based fine-tuning is applied to model parameters, while the agent’s own prompts may be RLHF-tuned using performance-derived rewards.

Memory $\mathcal{M}_t = \{(\mathcal{S}_v, I_v)\}_{v \leq t}$ , including all past script iterations and logs, is dynamically fed to the planner, supporting contextual reasoning and action selection.

2. AutoML-enabled Agentic Model Discovery and Adaptation

A distinct feature is the integration of domain-adaptive, iterative model discovery within the agentic loop. The “builder–critic” paradigm exemplified in (Emmanoulopoulos et al., 11 Jul 2025) enables the agent to propose, instantiate, and calibrate model classes—often stochastic differential equations (SDEs)—in an AutoML fashion:

Builder Agent: Given prior context and candidate model $m_n$ , code for $f,g$ in the SDE $dS_t = f(S_t,t;\theta)dt + g(S_t,t;\theta)\circ dW_t$ is generated, compiled, and parameter-initialized.
Critic Agent: Calibrated MC simulations are evaluated via calibration loss $\mathcal{L}(\theta)$ , LLM-judged novelty, and symbolic similarity (e.g., via Weisfeiler–Lehman kernels). The critic suggests next candidates, driving an evolutionary search.

Calibration exploits differentiable solvers (e.g., diffrax) and optimizers (e.g., optax Adam) in the agent’s toolchain, with initial $\theta^0$ proposals LLM-generated. Open-ended SDE forms are supported, from CEV and CIR to nonlinear and jump-diffusion variants.

3. Data Acquisition, Context Assembly, and Tool-Orchestrated Execution

Agentic trading systems autonomously construct their input context, often synthesizing unstructured, real-time data without supervisory curation. The nowcasting system in (Chen et al., 17 Jan 2026) demonstrates end-to-end agentic information acquisition:

The agent dynamically issues web search queries, parses high-authority sources, and extracts structured features (e.g., via regex) from unstructured HTML/text.
Data recency and look-ahead control are enforced by timestamped caching and prompt execution strictly at the decision edge, eliminating forward-looking bias.
Signals from news, social media, filings, and market data are synthesized into quantitative features, supporting both traditional model training and prompt-based nowcasting.

Structured prompt protocols (exemplified by standard 10-step interviews) yield multi-horizon, multi-attribute predictive signals for each asset of interest (Chen et al., 17 Jan 2026).

4. Decision-Making, Trade Execution, and Portfolio Construction

Once forecasting and risk metrics are generated, the agentic system operationalizes trading via explicit decision protocols:

Model-based trade logic: Outputs include trend signals (e.g., RSI, drift polarity) and risk metrics (VaR, CVaR, MDD, EVT tail parameters) (Emmanoulopoulos et al., 11 Jul 2025).
LLM-formed trade instructions: Prompts integrate current risk/trend metrics and synthesized news; the output is a discrete trade action ( $T_t \in \{\text{buy},\text{sell},\text{hold}\}$ ).
Automated portfolio construction: In fully agentic nowcasting setups, the agent ranks assets (e.g., by predicted attractiveness $\hat{y}_{i,t}$ ), constructs value-weighted portfolios (e.g., top 20 Russell 1000 stocks), and executes at the opening auction (Chen et al., 17 Jan 2026).

Position sizing, cash updates, and transaction cost modeling (e.g., proportional to bid-ask spreads or explicit $\kappa P_t S_t$ terms) are fully codified. Results are measured open-to-open to preserve strict out-of-sample regimes.

5. Feedback, Auditing, Adaptivity, and Transparency

A salient property is persistent introspection. Each feedback iteration is logged as a tuple $(\mathcal{C}_t, \mathcal{A}_t, I_t, r_t)$ , enabling:

Robust debugging: Bugs are localized as only code-edit actions affect scripts; error correction employs restate-and-fix prompt patterns (Ang et al., 19 Aug 2025).
Traceability and provenance: Each code line is annotated by its originating iteration. Auditing metrics include success rates and decision entropy:

$H(\mathcal{A} | \mathcal{C}) = -\sum_a \pi(a | \mathcal{C}) \log \pi(a | \mathcal{C})$

Reflective learning: Post-experiment reflections $r_t$ bias subsequent agent decision policies, facilitating adaptive learning.

Fine-tuning performance is empirically coordinated with RLHF-style prompt optimization, aligning agent incentives with downstream accuracy and stability.

6. Empirical Outcomes and Comparative Benchmarks

Empirical evaluations demonstrate significant performance advantages for AutoML-enabled agentic systems. In TS-Agent, on time-series forecasting:

Dataset	Model	RMSE	Success
Crypto	TS-Agent	0.206¹	100%
	DS-Agent	0.297	60%
Exchange	TS-Agent	0.0068	100%
Stock	TS-Agent	8.017	100%

¹ Best using GPT-4o backbone (Ang et al., 19 Aug 2025)

Sharpe ratio differences in trading utility and synthetic market tests show TS-Agent and similar frameworks outperform static AutoML and baseline agentic paradigms, reflecting reduced error propagation, higher traceability, and improved decision stability (Ang et al., 19 Aug 2025, Emmanoulopoulos et al., 11 Jul 2025).

The nowcasting framework in (Chen et al., 17 Jan 2026) achieves daily Fama-French+momentum alpha of $18.4$ bps and annualized Sharpe $2.43$, though performance is concentrated in the top-ranked assets. Transaction costs consume less than $10\%$ of alpha, supporting practical deployability.

7. Limitations, Open Issues, and Directions

Observed limitations include asymmetric predictability; agentic strategies effectively identify positive-alpha assets but provide little signal on short/underperformers, attributed to news obfuscation and social noise (Chen et al., 17 Jan 2026). Signal concentration is a consistent theme, with rapid alpha dilution outside the top deciles.

AutoML elements are primarily integrated in model selection, prompt tuning, ensembling, and portfolio parameterization, but the boundaries between fixed-agent and AutoML-enabled approaches can be fluid. Persistent outperformance, interpretability, and robustness—especially in adversarial or highly non-stationary environments—are ongoing challenges and active research frontiers.

Empirical studies confirm that agentic orchestration imbued with AutoML-style iterative search, explicit feedback, and structured domain knowledge is an effective paradigm for constructing auditable, adaptive, high-fidelity trading systems in real-world financial contexts (Ang et al., 19 Aug 2025, Emmanoulopoulos et al., 11 Jul 2025, Chen et al., 17 Jan 2026).