Agentic Time Series Forecasting (ATSF)

Updated 9 February 2026

Agentic Time Series Forecasting (ATSF) is a paradigm that treats forecasting as an iterative decision-making process executed by intelligent agents using perception, planning, action, and reflection.
It integrates dynamic tool use, self-adaptation, and memory feedback to enhance interpretability and auditability across domains like finance, energy, and operations.
ATSF employs structured agentic workflows and multi-agent orchestration to achieve iterative improvement, robust performance, and transparency in nonstationary environments.

Agentic Time Series Forecasting (ATSF) is a paradigm that reframes time series forecasting as an iterative, decision-making process executed by intelligent agents—often powered by LLMs—rather than as a single-pass prediction by a fixed model. ATSF orchestrates forecasting workflows through perception, planning, action, reflection, and memory, incorporating dynamic tool use, self-evaluation, and adaptation to new evidence and objectives. This approach has driven rapid progress in automating complex, interpretable, and audit-friendly pipelines for domains including finance, energy, and operations, and has motivated a new generation of research and open-source frameworks.

1. Conceptual Foundations and Definitions

Traditional time series forecasting seeks a mapping

$f_\theta : \mathcal{X}_{1:T} \to \mathcal{Y}_{T+1:T+H}$

where a trained model $f_\theta$ predicts future values given historical context. This model-centric view is static and does not support iterative reasoning or adaptation post-training. In contrast, ATSF formulates forecasting as an agentic workflow executed over decision cycles $t=1,2,\ldots$ , wherein an agent $\mathcal{A}$ interacts with an environment $\mathcal{E}$ and updates its state $s_t$ through:

Perception ( $\Phi$ ): transforming raw input and memory into internal context.
Planning ( $\Pi$ ): formulating strategies, sub-tasks, and tool selections.
Action ( $\mathcal{T}$ ): executing plans, invoking forecasting models or external tools.
Reflection ( $\mathcal{C}$ ): evaluating predictions and deciding on acceptance or revision.
Memory ( $f_\theta$ 0): experience accumulation and retrieval for improvement (Cheng et al., 2 Feb 2026, Garza et al., 30 Aug 2025).

ATSF is thus characterized by explicit feedback loops, tool and knowledge integration, self-correction, and adaptive workflow control.

2. Taxonomy and System Types

ATSF is situated at the "branch-structured reasoning" end of modern reasoning taxonomies for time series, as opposed to "direct reasoning" (single forward LLM pass) or "linear chain" (fixed scripted workflows). Core attributes include:

Use of single or multiple collaborating agents, often LLM-powered.
Explicit tool calls—e.g., retrieval modules, simulators, model APIs.
Multiple candidate forecasts and critique/pruning of hypotheses.
Iterative propose-evaluate-refine cycles until stopping criteria are met.
Memory and evidence tracking for transparency and re-use (Chang et al., 15 Sep 2025).

Architectural Overview:

Taxonomy Tier	Reasoning Flow	Key Features
Direct	One-shot LLM call	No feedback, no tool use
Linear Chain	Fixed step script	No branching, limited loop
Branch-Structured	Agentic, Feedback	Agent loops, tool calls, critique, memory

Notable frameworks classified as ATSF systems include TS-Agent (Ang et al., 19 Aug 2025), TimeCopilot (Garza et al., 30 Aug 2025), and TimeSeriesScientist (Zhao et al., 2 Oct 2025).

3. Methodological Frameworks and Representative Architectures

a. Structured Agentic Workflows

ATSF agents are formalized as Markov Decision Processes (MDP): $f_\theta$ 1, with:

State $f_\theta$ 2: encapsulating current pipeline, dynamic memory, external resources, and task specification.
Actions $f_\theta$ 3: decomposed into model selection, code or pipeline refinement, hyperparameter tuning, and execution/evaluation.
Transitions: deterministic given edits and result logs.
Reward: scalar improvement in loss/metric, reward clipping for non-improvement.
Planner Policy: learned or heuristic, optimizing cumulative improvement (Ang et al., 19 Aug 2025).

Example pipeline (TS-Agent):

Stage 1: Model pre-selection using case-based retrieval.
Stage 2: Alternating code refinement (“Apricot”) and hyperparameter tuning (“Peony”) in a round-robin feedback loop.
Knowledge Banks: Structured repositories for case similarity, refinement tips, and model code bases guide and constrain agent decisions.

b. Multi-Agent and Retrieval-Augmented Architectures

Other ATSF paradigms employ hierarchical multi-agent orchestration:

Master Agent: Parses user query, routes to specialized sub-agents.
Sub-Agents: SLMs fine-tuned for specific tasks (e.g., forecasting, imputation, anomaly detection) with access to task-specific prompt pools.
Retrieval-Augmented Generation: At inference, relevant historical patterns are retrieved and concatenated to the input context for prompt engineering, improving handling of distributional shift and spatio-temporal dependencies (Ravuru et al., 2024).

c. Modular Autonomous Pipelines

Frameworks such as TimeCopilot organize ATSF via:

Unified LLM-TSFM interface: LLM agent plans and executes feature analysis, model selection, cross-validation, and ensembling via calls to diverse tool APIs.
Chain-of-thought reasoning and decision logs: The LLM’s reasoning traces every step for full auditability.
Model escalation: The agent escalates to higher-capacity models when simpler ones underperform, optimizing both accuracy and computation cost (Garza et al., 30 Aug 2025).

4. Benchmarking, Metrics, and Empirical Outcomes

Agentic systems are evaluated on classical and agent-specific criteria:

Dataset / Method	Core Metric	Key Result
Crypto (TS-Agent)	RMSE (↓)	0.206 (TS-Agent, GPT-4o), 100% success
PeMSD3 (Agentic RAG)	MAE (↓)	13.01 (-16.4% vs STG-NCDE Llama-8B)
ETTh1 (TSci)	MAE (↓)	2.02 (TSci) vs. 5.20 (Gemini)

Key agentic metrics include:

Iterative improvement gain (first vs. last cycle error reduction)
Planning efficiency (number of tool calls per forecast)
Memory utilization and reuse
Calibration of reflection/critique (proportion of correct revise vs. accept)
Agentic overhead (compute cost, latency per cycle) (Cheng et al., 2 Feb 2026, Ang et al., 19 Aug 2025).

Statistical significance is commonly established via paired t-tests, with TS-Agent’s RMSE improvements over DS-Agent significant at $f_\theta$ 4 (Ang et al., 19 Aug 2025).

5. Auditing, Interpretability, and Risk Control

ATSF architectures prioritize:

Modular, explainable code or pipeline edits—only dedicated sub-actions modify critical logic, facilitating error isolation.
Detailed decision logs: All code diffs, model and parameter choices, metrics, and rationales are logged for audit.
Chain-of-edit (or reasoning) traces: Step-by-step justifications facilitate compliance and validate risk controls.
Error rollback: Unsuccessful modifications are reverted, containing error propagation.
Bounded editing/decision space: Constraining edits to predefined templates, APIs, or patch libraries further controls the space of admissible changes (Ang et al., 19 Aug 2025, Garza et al., 30 Aug 2025).

ATSF systems such as TimeSeriesScientist generate comprehensive reports that include rationales, performance tables, and workflow logs, increasing transparency and facilitating reproducibility (Zhao et al., 2 Oct 2025).

6. Representative Implementations and Applications

TS-Agent is tailored for financial time series, employing structured knowledge banks and iterative reflective feedback to achieve superior accuracy, robustness, and interpretability over AutoML and prior agentic baselines (Ang et al., 19 Aug 2025).

Agentic Retrieval-Augmented Generation (RAG) exploits a multi-agent hierarchy with prompt pools for spatio-temporal and distributionally-shifted data (e.g. traffic forecasting, anomaly detection), enabling modular agent upgrades and transfer across major time series tasks (Ravuru et al., 2024).

TimeCopilot provides an open-source, LLM-agnostic interface for univariate forecasting with natural language explainability, model hub integration, and reproducibility on large benchmarks (e.g. GIFT-Eval) (Garza et al., 30 Aug 2025).

TimeSeriesScientist introduces a four-agent collaboration (Curator, Planner, Forecaster, Reporter) that automates quality diagnostics, model selection, ensembling, and transparent reporting across eight standard benchmarks, outperforming LLM and statistical baselines by large margins (Zhao et al., 2 Oct 2025).

7. Key Challenges and Future Directions

Major challenges for ATSF include:

Memory design: Effective summarization, pruning, and generalization of episodic experience.
Toolkit standardization: Verifiable, composable APIs for tool integration.
Multi-agent coordination: Communication, credit assignment, and redundancy control as systems scale.
Distribution shift and streaming: Adaptation to regime changes and bounded context windows; stateful architectures and shift-aware benchmarks (e.g. TimeSeriesGym) are under active development (Chang et al., 15 Sep 2025).
Efficient, scalable, and cost-effective execution: Branch-structured agents risk combinatorial compute explosion; budgeting and early exit criteria are required.
Causal and intervention-aware forecasting: Integrating counterfactual reasoning and policy evaluation.
End-to-end policy learning: Reinforcement learning in agentic pipelines remains limited and often struggles with interpretability (Cheng et al., 2 Feb 2026).

Open research questions include establishing closed-loop testbeds with explicit cost/latency regimes, robust benchmarks for shift-aware and streaming settings, principled critic modules, and scalable architectures for real-time, multimodal contexts (Chang et al., 15 Sep 2025, Cheng et al., 2 Feb 2026, Garza et al., 30 Aug 2025).

ATSF marks a shift from single-model, static prediction to continual, evidence-driven, and tool-augmented reasoning by modular agents. Leading systems demonstrate superior benchmarks on both accuracy and reliability, particularly under nonstationarity or when interpretability and auditability are paramount. Continuing advances in agentic integration, feedback, and scalable orchestration are central to the evolution of operational, trustworthy time series forecasting systems.