CRPS-Based Ensemble Training
- The paper introduces CRPS-based ensemble training that leverages mixability to optimally combine expert forecasts with proven regret bounds.
- It details the methodology of Vovk’s Aggregating Algorithm, exploiting CRPS's discretized structure to achieve efficient online learning.
- Empirical results on synthetic and real data demonstrate up to 20–30% lower forecasting loss and improved calibration through adaptive expert weighting.
The Continuous Ranked Probability Score (CRPS) provides a strictly proper scoring rule for evaluating and combining probabilistic forecasts. CRPS-based ensemble training methodologies have become foundational for online learning and aggregation of distributional predictors, enabling robust convex and online combinations, rigorous regret bounds, and empirically validated improvements in probabilistic and calibration metrics. This article systematically develops the theory, algorithms, and practical guidance for CRPS-based ensemble training, with particular emphasis on online learning in the prediction-with-expert-advice regime and the mixability properties exploited by Vovk’s Aggregating Algorithm.
1. Mathematical Definition of CRPS and Properness
CRPS quantifies the discrepancy between a predictive cumulative distribution function (CDF) and a realized outcome over a continuous interval by the squared integral: where is the Heaviside step function. This score is strictly proper; it is uniquely minimized (in expectation) if and only if coincides with the true data-generating distribution. Typical implementations use discrete approximations or closed-form formulas for ensembles and parametric distributions.
Key representations include:
- For a random variable : with iid from .
- For ensembles: , where is the ensemble size.
2. Mixability and Vovk’s Aggregating Algorithm for CRPS
A central result is the proof that CRPS is -mixable, with , within the prediction-with-expert-advice formalism. A loss function is called -mixable if, for any set of expert forecasts and probability weights , there exists a combined forecast such that
The CRPS’s discretized structure directly leverages the mixability of the scalar square loss . Via grid approximation and analysis, the overall mixability constant for CRPS is derived as , independent of discretization fineness.
In Vovk’s Aggregating Algorithm:
- Expert weights are initialised uniformly ().
- After observing , the normalized weights are updated multiplicatively:
- The aggregated learner CDF at is computed as:
This scheme optimally blends expert forecasts in closed form at each round, preserving the mixability guarantee.
3. Online Ensemble Training Protocol and Regret Analysis
The complete online procedure includes:
- Gathering expert CDFs for each round.
- Pointwise aggregation via the mixable substitution function.
- Observing outcome and evaluating instantaneous CRPS loss.
- Multiplicative update of expert weights.
Pseudocode summary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
Input: experts 1...N, learning rate η=2/(b−a) Initialize: w[i] ← 1/N for i in 1...N For t=1 to T: Receive expert CDFs F[i](⋅) for i=1...N Normalize weights: W ← sum_i w[i]; w*[i] ← w[i]/W For each u in [a,b]: Compute numerator(u) = sum_i w*[i] * exp(-2 * F[i](u)**2) Compute denominator(u) = sum_i w*[i] * exp(-2 * (1 - F[i](u))**2) Set F_agg(u) = 0.5 - 0.25 * log(numerator(u)/denominator(u)) Output aggregated CDF F_agg Observe y_t; compute h_t=∫(F_agg(u)−H(u−y_t))² du For i=1...N: ℓ_{i,t}=∫(F[i](u)−H(u−y_t))²du For i=1...N: w[i] ← w[i] * exp(-η * ℓ_{i,t}) EndFor |
Theoretical guarantee: For any expert and horizon ,
This represents an regret bound, independent of , certifying near-optimal performance relative to the best-fixed expert.
4. Numerical Experiments and Practical Performance
Empirical evaluation features:
- Synthetic experiment: abrupt/smooth mixtures of triangular densities, demonstrating that AA’s CRPS tracks the best expert or mixture, with up to 20–30% lower total loss compared to weighted averages in environments with leadership switching.
- Real-data forecasting: short-term electric load (GEFCOM 2014), using 21 Gaussian-mixture experts partitioned by “calendar regime.” The AA ensemble with expert confidences and Fixed-Share smoothing achieves lower cumulative CRPS and discounted regret than all individual experts and weighted averages; the “sleeping” expert mechanism further decreases loss by 3–5% in regime-specialized cases.
5. Extensions: Confidence Weights, Fixed Share, and Computational Details
Key operational options:
- Expert specialization and “sleeping” can be handled by incorporating expert-specific confidence levels into the aggregation and weight update, using:
- Fixed-Share mixing () enables robust adaptation in switching environments; select for rare switches.
- Efficient CRPS integral evaluation via quadrature methods yields time per round, with –$200$ grid points sufficing for 1% error.
The mixability property further generalizes to other proper scoring rules, provided a corresponding substitution function and mixability constant can be determined.
6. Practical Guidance and Limitations
Implementation recommendations include:
- Set learning rate to $2/(b-a)$ for AA, use $1/[2(b-a)]$ for exponentially-weighted averages.
- Domain bounds should reflect the observed minimum/maximum of the application.
- Regularly update share-decay and confidence parameters in nonstationary regimes.
- For reliable operational forecasts, always cross-validate against calibration indices in addition to CRPS.
Notable limitations:
- If expert specialization is not properly encoded, adaptation speed and aggregate skill may suffer.
- CRPS’s properness guarantees are only as strong as the correctness and flexibility of the underlying CDF modeling.
- In high-dimensional or functional-output setups, complexity of CDF representation and CRPS integration may become prohibitive without careful discretization or parametric approximation.
7. Summary and Significance
CRPS-based ensemble training leverages the mixability of CRPS to deliver adaptive, online probabilistic aggregation in expert-advice settings. The resulting Vovk Aggregating Algorithm achieves optimal regret bounds (), adapts effectively to regime switches and specialization, and in empirical tests achieves consistent reductions in forecasting error compared to static and convex-averaging approaches. The CRPS/AA paradigm is broadly extensible, numerically efficient, and underpins state-of-the-art practical ensemble systems for probabilistic time series and distributional forecasting (V'yugin et al., 2019).