CRPS-Based Ensemble Training

Updated 2 February 2026

The paper introduces CRPS-based ensemble training that leverages mixability to optimally combine expert forecasts with proven regret bounds.
It details the methodology of Vovk’s Aggregating Algorithm, exploiting CRPS's discretized structure to achieve efficient online learning.
Empirical results on synthetic and real data demonstrate up to 20–30% lower forecasting loss and improved calibration through adaptive expert weighting.

The Continuous Ranked Probability Score (CRPS) provides a strictly proper scoring rule for evaluating and combining probabilistic forecasts. CRPS-based ensemble training methodologies have become foundational for online learning and aggregation of distributional predictors, enabling robust convex and online combinations, rigorous regret bounds, and empirically validated improvements in probabilistic and calibration metrics. This article systematically develops the theory, algorithms, and practical guidance for CRPS-based ensemble training, with particular emphasis on online learning in the prediction-with-expert-advice regime and the mixability properties exploited by Vovk’s Aggregating Algorithm.

1. Mathematical Definition of CRPS and Properness

CRPS quantifies the discrepancy between a predictive cumulative distribution function (CDF) $F$ and a realized outcome $y$ over a continuous interval $\Omega = [a, b] \subset \mathbb{R}$ by the squared integral: $\mathrm{CRPS}(F, y) = \int_a^b [F(u) - H(u - y)]^2 \, du$ where $H(t) = 1_{t \ge 0}$ is the Heaviside step function. This score is strictly proper; it is uniquely minimized (in expectation) if and only if $F$ coincides with the true data-generating distribution. Typical implementations use discrete approximations or closed-form formulas for ensembles and parametric distributions.

Key representations include:

For a random variable $Y \sim F$ : $\mathrm{CRPS}(F, y) = \mathbb{E}|Y - y| - \frac{1}{2} \mathbb{E}|Y - Y'|$ with $Y, Y'$ iid from $F$ .
For ensembles: $\mathrm{CRPS}(\{z^i\}, y) = \frac{1}{M}\sum_i |z^i - y| - \frac{1}{2 M^2} \sum_{i, k} |z^i - z^k|$ , where $M$ is the ensemble size.

2. Mixability and Vovk’s Aggregating Algorithm for CRPS

A central result is the proof that CRPS is $\eta$ -mixable, with $\eta = 2/(b-a)$ , within the prediction-with-expert-advice formalism. A loss function $\lambda(f, y)$ is called $\eta$ -mixable if, for any set of $N$ expert forecasts $c_1, \dots, c_N$ and probability weights $q_1, \dots, q_N$ , there exists a combined forecast $f$ such that

$\lambda(f, y) \leq -\frac{1}{\eta} \ln \sum_{i=1}^N q_i e^{-\eta \lambda(c_i, y)} \quad \forall y \in \Omega$

The CRPS’s discretized structure directly leverages the mixability of the scalar square loss $\ell(\gamma, \omega) = (\gamma - \omega)^2$ . Via grid approximation and analysis, the overall mixability constant for CRPS is derived as $\eta = 2/(b-a)$ , independent of discretization fineness.

In Vovk’s Aggregating Algorithm:

Expert weights $w_{i, t}$ are initialised uniformly ( $w_{i, 1} = 1/N$ ).
After observing $y_t$ , the normalized weights $w^*_{i, t}$ are updated multiplicatively:

$w_{i, t+1} = w_{i, t} \exp\left(-\frac{2}{b-a} \mathrm{CRPS}(F_{i, t}, y_t)\right)$

The aggregated learner CDF at $u \in [a,b]$ is computed as:

$F_t(u) = \frac{1}{2} - \frac{1}{4} \ln \frac{ \sum_{i=1}^N w^*_{i, t}\exp(-2 F_{i, t}(u)^2) }{ \sum_{i=1}^N w^*_{i, t}\exp(-2 (1 - F_{i, t}(u))^2) }$

This scheme optimally blends expert forecasts in closed form at each round, preserving the mixability guarantee.

3. Online Ensemble Training Protocol and Regret Analysis

The complete online procedure includes:

Gathering expert CDFs for each round.
Pointwise aggregation via the mixable substitution function.
Observing outcome and evaluating instantaneous CRPS loss.
Multiplicative update of expert weights.

Pseudocode summary:

Input: experts 1...N, learning rate η=2/(b−a)
Initialize: w[i] ← 1/N  for i in 1...N
For t=1 to T:
    Receive expert CDFs F[i](⋅) for i=1...N
    Normalize weights: W ← sum_i w[i];  w*[i] ← w[i]/W
    For each u in [a,b]:
        Compute numerator(u)   = sum_i w*[i] * exp(-2 * F[i](u)**2)
        Compute denominator(u) = sum_i w*[i] * exp(-2 * (1 - F[i](u))**2)
        Set F_agg(u) = 0.5 - 0.25 * log(numerator(u)/denominator(u))
    Output aggregated CDF F_agg
    Observe y_t; compute h_t=∫(F_agg(u)−H(u−y_t))² du
    For i=1...N: ℓ_{i,t}=∫(F[i](u)−H(u−y_t))²du
    For i=1...N:
        w[i] ← w[i] * exp(-η * ℓ_{i,t})
EndFor

Theoretical guarantee: For any expert $i$ and horizon $T$ ,

$\sum_{t=1}^T \mathrm{CRPS}(F_t, y_t) \leq \sum_{t=1}^T \mathrm{CRPS}(F_{i, t}, y_t) + \frac{b-a}{2} \ln N$

This represents an $O(\ln N)$ regret bound, independent of $T$ , certifying near-optimal performance relative to the best-fixed expert.

4. Numerical Experiments and Practical Performance

Empirical evaluation features:

Synthetic experiment: abrupt/smooth mixtures of triangular densities, demonstrating that AA’s CRPS tracks the best expert or mixture, with up to 20–30% lower total loss compared to weighted averages in environments with leadership switching.
Real-data forecasting: short-term electric load (GEFCOM 2014), using 21 Gaussian-mixture experts partitioned by “calendar regime.” The AA ensemble with expert confidences and Fixed-Share smoothing achieves lower cumulative CRPS and discounted regret than all individual experts and weighted averages; the “sleeping” expert mechanism further decreases loss by 3–5% in regime-specialized cases.

Key operational options:

Expert specialization and “sleeping” can be handled by incorporating expert-specific confidence levels $p_{i, t} \in [0, 1]$ into the aggregation and weight update, using:

$w^*_{i, t} \propto p_{i, t} w_{i, t}$

$w_{i, t+1} = w_{i, t} \exp \left[ - \eta ( p_{i, t} \ell_{i, t} + (1 - p_{i, t}) h_t ) \right ]$

Fixed-Share mixing ( $\alpha \approx 10^{-3}$ ) enables robust adaptation in switching environments; select $\alpha = O(1/T)$ for rare switches.
Efficient CRPS integral evaluation via quadrature methods yields $O(NM)$ time per round, with $M = 100$ –$200$ grid points sufficing for $<$ 1% error.

The mixability property further generalizes to other proper scoring rules, provided a corresponding substitution function and mixability constant can be determined.

6. Practical Guidance and Limitations

Implementation recommendations include:

Set learning rate $\eta$ to $2/(b-a)$ for AA, use $1/[2(b-a)]$ for exponentially-weighted averages.
Domain bounds $(a, b)$ should reflect the observed minimum/maximum of the application.
Regularly update share-decay and confidence parameters in nonstationary regimes.
For reliable operational forecasts, always cross-validate against calibration indices in addition to CRPS.

Notable limitations:

If expert specialization is not properly encoded, adaptation speed and aggregate skill may suffer.
CRPS’s properness guarantees are only as strong as the correctness and flexibility of the underlying CDF modeling.
In high-dimensional or functional-output setups, complexity of CDF representation and CRPS integration may become prohibitive without careful discretization or parametric approximation.

7. Summary and Significance

CRPS-based ensemble training leverages the mixability of CRPS to deliver adaptive, online probabilistic aggregation in expert-advice settings. The resulting Vovk Aggregating Algorithm achieves optimal regret bounds ( $O(\ln N)$ ), adapts effectively to regime switches and specialization, and in empirical tests achieves consistent reductions in forecasting error compared to static and convex-averaging approaches. The CRPS/AA paradigm is broadly extensible, numerically efficient, and underpins state-of-the-art practical ensemble systems for probabilistic time series and distributional forecasting (V'yugin et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

Online Learning with Continuous Ranked Probability Score (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CRPS-Based Ensemble Training.

CRPS-Based Ensemble Training

1. Mathematical Definition of CRPS and Properness

2. Mixability and Vovk’s Aggregating Algorithm for CRPS

3. Online Ensemble Training Protocol and Regret Analysis

4. Numerical Experiments and Practical Performance

6. Practical Guidance and Limitations

7. Summary and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

CRPS-Based Ensemble Training

1. Mathematical Definition of CRPS and Properness

2. Mixability and Vovk’s Aggregating Algorithm for CRPS

3. Online Ensemble Training Protocol and Regret Analysis

4. Numerical Experiments and Practical Performance

5. Extensions: Confidence Weights, Fixed Share, and Computational Details

6. Practical Guidance and Limitations

7. Summary and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research