The Nonstationarity-Complexity Tradeoff in Return Prediction

Published 29 Dec 2025 in stat.ML, cs.LG, and q-fin.GN | (2512.23596v1)

Abstract: We investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff: complex models reduce misspecification error but require longer training windows that introduce stronger non-stationarity. We resolve this tension with a novel model selection method that jointly optimizes model class and training window size using a tournament procedure that adaptively evaluates candidates on non-stationary validation data. Our theoretical analysis demonstrates that this approach balances misspecification error, estimation variance, and non-stationarity, performing close to the best model in hindsight. Applying our method to 17 industry portfolio returns, we consistently outperform standard rolling-window benchmarks, improving out-of-sample $R^2$ by 14-23% on average. During NBER-designated recessions, improvements are substantial: our method achieves positive $R^2$ during the Gulf War recession while benchmarks are negative, and improves $R^2$ in absolute terms by at least 80bps during the 2001 recession as well as superior performance during the 2008 Financial Crisis. Economically, a trading strategy based on our selected model generates 31% higher cumulative returns averaged across the industries.

Abstract PDF Upgrade to Chat

Summary

The paper introduces the nonstationarity-complexity tradeoff, showing that simpler, short-window models may outperform complex methods during regime shifts.
It proposes Adaptive Tournament Model Selection (ATOMS), which optimally adjusts training windows to balance bias, variance, and nonstationarity errors.
Empirical analyses on industry portfolios reveal ATOMS improves out-of-sample R² by 14–23% and enhances trading returns by 31% during market stress.

The Nonstationarity-Complexity Tradeoff in Return Prediction

Introduction

This paper addresses the predictive modeling of asset returns in nonstationary financial environments, highlighting a core statistical and operational challenge: the intrinsic tension between model complexity and data nonstationarity. By formalizing this as a nonstationarity-complexity tradeoff, the authors advance both the theoretical understanding and empirical practice of financial return prediction, with implications for model selection, forecast performance, and economic value.

Nonstationarity-Complexity Tradeoff: Formalization and Empirical Evidence

Joint Dependence of Error Components

Classical learning theory in stationary settings decomposes prediction error into bias (misspecification) and variance (uncertainty), with complexity controlling their balance. However, in nonstationary environments where the data distribution $P_t$ drifts over time, there is an additional error component due to distribution shift.

The prediction error for a model $f$ from a class $\mathcal{F}$ trained on a window of length $k$ is

$\text{Prediction Error}(f) \lesssim \text{Misspecification}(\mathcal{F}) + \text{Uncertainty}(\mathcal{F}, n_k) + \text{Non-stationarity}(k).$

Here, $n_k$ is the sample size from the last $k$ windows. Model complexity reduction mitigates misspecification but amplifies estimation variance, requiring longer data windows, which in turn increases nonstationarity error due to older, less-relevant regimes (Theorem 1).

Empirical Illustration

Using 17 GICS industry portfolios (1987–2016), the authors systematically compare models of different complexity (ridge-regularized linear, random forest) with various training windows. They show that:

In periods of major regime change (e.g., NBER recessions), simple models trained on short windows outperform highly expressive models trained on long or expanding windows, contradicting the classical ML intuition that "more complexity" and "more data" guarantee better performance.
Figure 1: The number of industries where each model achieves the highest annual out-of-sample $R^2$ .

Figure 2: Annual out-of-sample $R^2$ for each model and each industry, highlighting periods where simpler, shorter-window models dominate.

These findings hold even when controlling for model and data selection protocols, confirming that nonstationarity cannot be ignored or treated as secondary to complexity.

Adaptive Model and Data Selection under Nonstationarity

Tournament-Based Adaptive Model Selection

Standard model selection strategies (holdout, cross-validation) are rendered suboptimal in nonstationary contexts because performance estimated on the past may not reflect performance on present or future data distributions. The proposed solution is the Adaptive Tournament Model Selection (ATOMS):

Candidate models (varying both complexity and window size) are compared via a sequential-elimination tournament, where comparison is performed using adaptive rolling windows.
The validation window is determined in a data-driven manner: for each pairwise comparison, the algorithm selects the validation window length that minimizes a bias-variance proxy which explicitly accounts for nonstationarity.
Figure 3: Schematic of model training and selection process under nonstationarity, with adaptive training and validation data splits.

This mechanism is shown to achieve excess risk that is (up to logarithmic factors) as small as that of the oracle model which knows in hindsight both the optimal class and window length for the prevailing regime (Theorems 2-4).

Theoretical Guarantees

Let $F$ denote the collection of all candidate model classes and $k$ the window. For independent data, the ATOMS selection $\hat{f}$ satisfies

$\mathcal{E}_t(\hat{f}) \lesssim \min_{f\in\mathcal{F},\,k} \bigg\{ \text{misspecification}(f) + \text{uncertainty}(f,k) + \text{nonstationarity}(k) \bigg\} + \widetilde{O}\left(\frac{1}{n_{t,k}}\right)$

with high probability. In particular, window size selection becomes crucial in the presence of temporal distribution shifts.

Empirical Evaluation: Industry Portfolios

Experimental Design

Experiments are performed on the Fama-French 17-industry portfolios. The predictor set includes macroeconomic factors (e.g., those from "Deep Learning in Asset Pricing"), lagged returns, and 94 characteristic-sorted long-short factor returns. Candidate models include penalized linear (ridge, LASSO, elastic net) and random forest specifications, each considered at various estimation window sizes (from 1 to 256 months and expanding).

Key Results

Out-of-sample $R^2$ : Across 1990–2016, ATOMS outperforms all fixed-window and cross-validation baselines, with average $R^2$ improvement of 14–23% (against best fixed validation windows).
Figure 4: Boxplot of OOS $R^2$ of ATOMS vs. fixed-window baselines across all industries and years.
Recession performance: During the Gulf War, dot-com, and Financial Crisis recessions, ATOMS achieves positive or strongly improved $R^2$ while fixed-window baselines are negative or near-zero. For example, during the 2001 recession, ATOMS attains $R^2=0.125$ vs. 0.117 for the best baseline, an 80 bps improvement.
Consistency across industries: The benefit is persistent (visible in nearly all industry categories and years).

Figure 5: Annual out-of-sample $R^2$ of ATOMS and fixed-window baselines for 17 industries.

Economic value: When used to drive a simple trading strategy (sign-based long/short positioning), ATOMS yields 31% higher cumulative returns over the OOS sample, compared to the best fixed-window method.

Figure 6: Cumulative wealth curves for ATOMS and benchmarks for each industry portfolio.

Covariate Dynamics

Supporting analyses of the covariate time series demonstrate considerable low-frequency nonstationarity and explosive volatility during market crises (see Figures 7–10 in the Appendix). This further motivates adaptive model/window selection and empirically validates the instability of standard asset pricing signals.

Implications and Forward Directions

From a theoretical standpoint, the results demonstrate that joint tuning of model complexity and data window is strictly necessary in nonstationary environments. The "virtue of complexity" is regime-dependent; in the presence of large temporal drifts, smaller, less flexible models with short window training may outperform agents using high-capacity estimators and long horizons.

For finance practice, these findings question the efficacy of traditional expanding window approaches for risk premium forecasting, especially during periods of macroeconomic stress, and provide a formal statistical basis for adaptive model selection. Economically, they suggest that dynamic SDFs and nonstationary factor exposures must be accounted for in asset pricing and portfolio construction.

Future technical work should address (1) extensions to temporally dependent data and (2) computationally efficient search across large model/horizon spaces (e.g., through warm starting or meta-learning).

Conclusion

The paper makes an important contribution by establishing and operationalizing the nonstationarity-complexity tradeoff in financial prediction, offering a robust, theoretically justified, and empirically validated adaptive model selection scheme. Both statistical and economic analyses confirm that adaptive, joint model and training window selection is essential for accurate and robust return prediction in environments characterized by recurrent structural breaks and regime shifts.

Markdown Report Issue