Label Horizon Paradox: Forecasting Insight

Updated 10 February 2026

Label Horizon Paradox is defined as the phenomenon where an intermediate training label yields higher out-of-sample correlation with the forecast target than the canonical final label.
A bi-level optimization framework jointly updates model weights and label weights, revealing that selecting an optimal intermediate horizon (h*) improves performance metrics like IC and Sharpe ratios.
Empirical evidence on Chinese equity universes demonstrates that dynamically optimizing supervision signals mitigates noise accumulation and enhances predictive accuracy in financial forecasting.

The Label Horizon Paradox is a phenomenon in supervised learning for financial forecasting wherein the optimal supervision signal—the label provided to a machine learning model during training—does not coincide with the horizon of the prediction target for which generalization is ultimately required. Classic forecasting pipelines tacitly assume that the best training objective matches the inference goal (i.e., the label is constructed at $t+A$ if predictions are evaluated at $t+A$ ). Song et al. have rigorously challenged this canon by demonstrating that, due to the interaction between time-varying signal realization and cumulative noise, superior predictive performance is consistently achieved by training with labels at an intermediate horizon $t+h^*$ , where $h^* \neq A$ . This constitutes the Label Horizon Paradox: the training label that maximizes out-of-sample correlation with the final target does not align with the application horizon but rather with a dynamically optimized proxy (Song et al., 3 Feb 2026).

1. Formal Definition and Empirical Manifestation

The Label Horizon Paradox is defined as the empirical observation that "minimizing training error on the canonical target horizon $t+A$ does not guarantee optimal generalization on $t+A$ ." Instead, it is well documented that models trained on an intermediate horizon $t+h^*$ , with $h^*\neq A$ , yield higher out-of-sample predictive correlation (typically measured by the information coefficient, or IC) on the true inference target. This overturns the default label-matching doctrine pervasive in short-term financial modeling (Song et al., 3 Feb 2026).

2. Dynamic Signal–Noise Trade-Off: Theoretical Foundation

The origin of the paradox lies in a time-dependent trade-off between the rates at which market alpha (signal) and idiosyncratic risk (noise) accumulate as a function of the forecasting horizon. The dynamic is formalized using a continuous-time Arbitrage Pricing Theory (APT) model in which realized returns from $t$ to $t+h$ are given by

$r_h = a(h)\,w^*{}^\top s + \epsilon_h, \quad \epsilon_h\sim\mathcal{N}(0, \sigma^2(h + h_0)),$

with $s$ denoting whitened factor exposures, $a(h)$ the cumulative signal realization function (monotonic, $a(0)=0$ , $a(h)\to 1$ ), and $\sigma^2(h + h_0)$ modeling random walk noise.

The squared out-of-sample correlation (information coefficient) of the estimator trained at horizon $h$ and evaluated at horizon $A$ is

$J(h) \approx \frac{a(h)^2}{a(h)^2 + K(h + h_0)} \frac{1}{\text{Var}(r_A)},$

with $K = \sigma^2/N$ a data-dependent constant.

Here, two marginal effects are distinguished:

Signal Gain (S(h)): $2\ln a(h)$ , capturing the incremental informativeness as $h$ increases;
Noise Accumulation (N(h)): $\ln[a(h)^2 + K(h + h_0)]$ , representing compounding uncertainty.

There exists a unique maximizing horizon $h^*$ that solves

$h^* = \arg\max_{1 \leq h \leq A} \{ S(h) - N(h) \},$

subject to the first-order condition

$\frac{a'(h)}{a(h)} \gtrless \frac{1}{2(h + h_0)}.$

When marginal signal gain falls below the pace of noise accumulation, moving to longer horizons hurts predictability, which explains why $h^*$ is typically less than $A$ (Song et al., 3 Feb 2026).

3. Bi-Level Optimization Framework for Adaptive Label Discovery

Because $a(h)$ and $\sigma^2$ are unknown a priori, the optimal proxy horizon $h^*$ must be learned from data rather than imposed by design. Song et al. introduce a bi-level optimization approach where the label horizon weights $\alpha\in\Delta^A$ (the $A$ -dimensional simplex) are treated as parameters, optimized jointly with model weights $\theta$ . The upper-level (outer) objective evaluates generalization error on the canonical target using model parameters obtained via the lower-level (inner) objective, which is a label-weighted loss across potential horizons. To encourage exploration and prevent degenerate solutions, an entropy penalty is applied to the $\alpha$ simplex.

Optimization proceeds in two stages:

A warm-up period with mean-field labels $\bar{r} = \frac{1}{A}\sum_h r_h$ to stabilize features.
Per-batch meta-optimization separating support and query samples, with $\theta$ updated using a weighted combination of proxy labels and $\alpha$ updated to minimize out-of-sample error on the final target, regularized by entropy (Song et al., 3 Feb 2026).

4. Empirical Evidence on Large-Scale Financial Forecasting

Empirical results on representative Chinese equity universes (CSI 300, CSI 500, CSI 1000) and multiple backbone architectures (LSTM, GRU, TCN, Transformer, SSM, and others) consistently demonstrate the paradox:

In daily close-to-close prediction, the bi-level method discovers $h^* < A$ and achieves higher out-of-sample IC, ICIR, and Sharpe ratios than models trained on the final horizon label. For instance, in CSI 300, IC increases from 0.637 to 0.720 (+13%) and ICIR from 0.443 to 0.562 (+27%).
For intraday 90min prediction, the “hump-shaped” curve is observed: intermediate horizons yield the most stable and informative predictive signals, reflected in higher ICIR and Sharpe.
Mean-field multi-task training or naive label averaging does not replicate these gains unless the aggregation is restricted to the most informative horizons as discovered by the adaptive $\alpha$ process.

This effect is architecture-agnostic and robust under various market regimes, but not universal: if the marginal signal function is strictly increasing within the chosen window, $h^* = A$ can result (as verified for certain short intraday scenarios), and the bi-level method seamlessly reduces to the standard baseline (Song et al., 3 Feb 2026).

5. Operational Guidelines and Practical Considerations

Practical deployment of label horizon optimization involves:

Discretizing the forecasting window into candidate horizons,
Initial mean-aggregation pretraining for stability,
Random partitioning of mini-batches for inner/outer loop optimization,
Modest entropy regularization to prevent overfitting to noisy proxy labels,
Sensitivity analysis for hyperparameters such as entropy weight and warm-up period.

The scheme is computationally efficient, requiring only 5–20% additional training time compared to the final-label baseline, and does not require brute-force sweeps across training configurations (Song et al., 3 Feb 2026).

Key limitations include the reliance on linear APT-based models for intuition, which may inadequately capture extreme high-frequency or highly nonstationary market dynamics, and sensitivity to the precise shape of the realized signal-to-noise curve. For nonstationary domains or abrupt regime shifts, frequent re-optimization of proxy horizons may be necessary.

6. Implications for Financial Machine Learning

The Label Horizon Paradox refocuses attention from model architectures to the structure and timing of supervision signals in sequential prediction tasks. It demonstrates that, in information environments where the process of price discovery and noise accumulation are dynamically imbalanced, judicious selection or online adaptation of training labels can yield persistent generalization benefits. The theoretical apparatus and practical meta-learning framework of Song et al. provide a systematic methodology for identifying and exploiting this structure, opening avenues for future label-centric research in domains characterized by low SNR, delayed information flow, or rapidly changing effective horizons (Song et al., 3 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

The Label Horizon Paradox: Rethinking Supervision Targets in Financial Forecasting (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Label Horizon Paradox.

Label Horizon Paradox: Forecasting Insight

1. Formal Definition and Empirical Manifestation

2. Dynamic Signal–Noise Trade-Off: Theoretical Foundation

3. Bi-Level Optimization Framework for Adaptive Label Discovery

4. Empirical Evidence on Large-Scale Financial Forecasting

5. Operational Guidelines and Practical Considerations

6. Implications for Financial Machine Learning

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Label Horizon Paradox: Forecasting Insight

1. Formal Definition and Empirical Manifestation

2. Dynamic Signal–Noise Trade-Off: Theoretical Foundation

3. Bi-Level Optimization Framework for Adaptive Label Discovery

4. Empirical Evidence on Large-Scale Financial Forecasting

5. Operational Guidelines and Practical Considerations

6. Implications for Financial Machine Learning

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research