Papers
Topics
Authors
Recent
Search
2000 character limit reached

RBLQNN: ReLU-Bias Loss Quantile Neural Network

Updated 31 January 2026
  • The paper demonstrates that RBLQNN significantly reduces quantile MAE and CRPS across complex datasets by employing quantile counter-balancing and ReLU bias loss.
  • The method uses a standard MLP architecture augmented with specialized loss function modifications for precise non-Gaussian uncertainty estimation in climate and geophysical contexts.
  • Empirical results show a 10–20% CRPS reduction and minimized quantile crossing, outperforming baseline models like LQR and MVE.

A ReLU-Bias Loss Quantile Neural Network (RBLQNN) is a quantile regression neural network framework designed for flexible and accurate estimation of predictive uncertainty. Its architecture is a standard multilayer perceptron (MLP) with two key modifications to the loss function: quantile counter-balancing and a "ReLU bias" penalty to enforce monotonicity among predicted quantiles. RBLQNN is developed to address limitations of conventional quantile regression and mean-variance neural network methods in capturing nonlinear and non-Gaussian conditional distributions, with particular success demonstrated for climate and geophysical data (Brettin et al., 24 Jan 2026).

1. Model Architecture

The RBLQNN utilizes a classical feed-forward MLP that maps a pp-dimensional input xx to mm output nodes, each providing an estimate at a specified quantile level qjq_j. The functional form is:

fθ:RpRm,fθ(x)=(y^q1,,y^qm),f_\theta : \mathbb{R}^p \to \mathbb{R}^m, \quad f_\theta(x) = (\hat{y}_{q_1}, \ldots, \hat{y}_{q_m}),

where qj{q1,,qm}q_j \in \{ q_1,\ldots, q_m \} are predetermined quantile levels.

Hidden layers employ ReLU activations, ReLU(z)=max{0,z}ReLU(z) = \max\{0, z\}, and only the final layer is linear. Special architectural features such as monotonic output layers are not required; avoidance of quantile crossing is delegated to the loss function. The standard implementation predicts m=19m=19 evenly spaced quantiles qj=0.05,0.10,...,0.95q_j = 0.05, 0.10, ..., 0.95\; extrapolated quantiles q=0q=0 and q=1q=1 are associated with -\infty and ++\infty, respectively.

2. Loss Function Design

The RBLQNN loss function extends standard quantile loss (the "pinball" loss) with two innovations:

  • Quantile Counter-Balancing: To correct for differential weighting of extremal quantiles in standard pinball loss, each term is rescaled by a quantile-specific factor λj\lambda_j, approximated assuming normality:

λj=exp(Φ1(qj)/2),\lambda_j = \exp(\Phi^{-1}(q_j)/2),

where Φ1\Phi^{-1} is the standard normal quantile function.

  • ReLU Bias Loss: To penalize non-monotonic quantile outputs (crossings), an additional term penalizes violations where y^qj>y^qj+1\hat{y}_{q_j} > \hat{y}_{q_{j+1}}:

LReLU(y^q)=j=1m1max{0,y^qjy^qj+1}.\mathcal{L}_{ReLU}(\hat{y}_q) = \sum_{j=1}^{m-1} \max\{0, \hat{y}_{q_j} - \hat{y}_{q_{j+1}}\}.

The total objective per sample is:

L(y,y^q)=LQ(y,y^q)+ηLReLU(y^q)\mathcal{L}(y, \hat{y}_q) = \mathcal{L}_Q(y, \hat{y}_q) + \eta \mathcal{L}_{ReLU}(\hat{y}_q)

where LQ\mathcal{L}_Q is the composite quantile pinball loss, and η>0\eta>0 is a tunable hyperparameter.

These modifications are designed to ensure uniform accuracy across quantiles and to minimize degenerate or non-physical predictive distributions. Unlike strictly "hard" approaches to monotonicity, the ReLU bias loss is a soft penalty, trading rare crossings for improved optimization transparency and implementation simplicity.

3. Training Regimen and Implementation

Training uses the Adam optimizer. Initialization includes a brief warm-up period where the loss is temporarily switched to mean squared error (MSE) to stabilize the initial stages. Model selection is conducted through early stopping, preserving weights with the lowest validation loss.

Hyperparameters include learning rates (10810^{-8} to 10110^{-1}), batch sizes ($128$–$512$), up to $200$ epochs with early-stop patience, and optional regularization (weight decay or dropout). All continuous inputs and target variables are standardized to zero mean and unit variance on a per-dataset or per-station basis.

Key practical considerations include the robustness of λj\lambda_j weights to deviations from normality and the elimination of the need for specialized monotonic output constraints, simplifying implementation.

4. Evaluation Metrics and Comparative Baselines

Model evaluation is performed through several established quantile regression metrics:

  • Quantile MAE: Mean absolute error between predicted and empirical quantiles, MAEq=(1/n)i=1nyq,iy^q,iMAE_q = (1/n) \sum_{i=1}^n |y_{q,i} - \hat{y}_{q,i}|.
  • Calibration via PIT histograms: The observed proportions rkr_k of samples in bins formed by predicted quantiles are compared against uniformity using D=(1/B)k=0m(rk1/B)2D = \sqrt{(1/B) \sum_{k=0}^{m}(r_k - 1/B)^2}.
  • CRPS: The continuous ranked probability score for quantile outputs, as in Laio & Tamea (2007):

CRPS(y^q,y)=2mj=1m[qj(yy^qj)1y>y^qj+(1qj)(y^qjy)1yy^qj].CRPS(\hat{y}_q, y) = \frac{2}{m} \sum_{j=1}^m \big[ q_j(y - \hat{y}_{q_j}) 1_{y>\hat{y}_{q_j}} + (1-q_j)(\hat{y}_{q_j} - y) 1_{y \leq \hat{y}_{q_j}} \big].

Baselines for comparison include Linear Quantile Regression (LQR), Mean-Variance Estimation (MVE) neural networks, and deterministic (MSE) regression. LQR is fit via composite quantile regression; MVE optimizes a Gaussian log-likelihood, transforming parameter outputs to quantiles via the normal CDF inverse.

The table summarizes the distinguishing characteristics:

Model Captures Nonlinearity Captures Non-Gaussianity Penalizes Crossings
LQR No No No
MVE Yes No N/A
RBLQNN Yes Yes Yes

5. Empirical Findings

The RBLQNN’s efficacy is demonstrated using synthetic and observational geophysical datasets:

  • Synthetic Benchmarks: On problems with strong nonlinearity and/or non-Gaussianity (Gumbel noise, Beta-distributed outputs, bimodal Boltzmann system), RBLQNN attains uniformly lower quantile MAE than LQR or MVE, accurately capturing nonlinear and multimodal dependencies. Variants lacking either quantile counter-balancing or ReLU bias loss show marked degradation in quantile crossing rate and prediction error (<3% crossing in standard RBLQNN).
  • Daily Maximum Temperature (GSOD): Applied to 1,501 NOAA GSOD weather station datasets with covariates including SLP and geopotential heights, RBLQNN achieves a 10–20% CRPS reduction over climatology and MSE at 99% of stations. Compared to LQR, CRPS is lower at 95% of stations (significantly so at 63%), and comparable to MVE nets at the majority of stations—reflecting that most temperature uncertainties are near-Gaussian except in regimes with nontrivial skewness or kurtosis.
  • Precipitation (TRMM): On regression of precipitation-related variables (including ERA5 humidity, temperature, and CAPE) for 220,000 test points, RBLQNN achieves a CRPS of 0.43×0.43 \times climatology, outperforming LQR (1.07×1.07 \times), MVE (1.12×1.12 \times), and MSE (0.83×0.83 \times). Bootstrap resampling confirms superiority in all pairwise comparisons.

Failure modes are observed in highly deterministic or highly stochastic regimes, where Gaussian MVE may slightly outperform due to the near-Gaussianity or near-triviality of the predictive distribution.

6. Limitations and Prospective Directions

While the RBLQNN incorporates a soft penalty for quantile crossing, rare crossings may still occur. Strictly enforcing monotonicity via architectural modifications, such as cumulative-increment schemes, could further limit crossings at the expense of simplicity.

Metrics such as CRPS require large sample sizes for sensitivity to moderate improvements; development of efficient proper scoring rules for finite-data regimes remains an open challenge. Extending RBLQNN to spatial–temporal architectures (e.g., CNN or RNN variants), Bayesian quantile neural networks, or incorporating physical priors and monotonic spline layers indicate promising research avenues.

7. Significance in Uncertainty Estimation

The RBLQNN presents a general approach to quantile regression with demonstrated flexibility, training stability, and predictive power in scenarios with nonlinear and non-Gaussian uncertainty structures. It obviates the need for architectural monotonic constraints and offers practical advantages for geophysical and climate uncertainty estimation tasks, including operational settings where robustness across a wide hyperparameter range is essential (Brettin et al., 24 Jan 2026). The method's empirical performance highlights its utility for problems where analytic forms are unknown or baseline mean-variance or linear approaches are inadequate.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ReLU-Bias Loss Quantile Neural Network (RBLQNN).